Class SchemaManager
Manages the initial creation and schema updates for a Postgres database.
Namespace: Neon.Postgres
Assembly: Neon.Postgres.dll
Syntax
public class SchemaManager
Remarks
This class uses some simple conventions to make it easy to upgrade a database schema over time as the data model evolves. This uses the concept of schema version numbers. A schema version is simply an integer value where the version will be 0 when a database is initially created and then the version is incremented by one whenever the database schema is updated.
This class uses a reserved table named DbInfoTableName that is used to keep track of the current schema version. This table will have a single row with these columns:
Version:integer | The integer database schema version. This will be set to 0 when the database is first created and will be incremented for each subsequent update. |
Updater:text | Used by multiple service instances to coordinate which one actually handles the update. This will be `NULL` when the database isn't being updated and will be set to a string identifying the entity currently updating the database. This string can be anything from a GUID, container ID, hostname, or whatever. |
UpdateStartUtc:timestamp | Time (UTC) when the most recent update was started. |
UpdateFinishUtc:timestamp | Time (UTC) when the most recent update was completed. This will be `NULL` while an update is in progress. |
You'll be authoring Postgres SQL script files to create the initial database as well as to upgrade the database for each subsequent schema change. By convention, this class assumes that the SQL scripts for each database will be saved to separate folders with each script file named like: schema-#.script where # is the schema version the script will upgrade the database to, with schema-0.script being the script that creates the database as Version 0. So your script folder will look something like:
schema-0000.script <-- database created as v0
schema-0001.script <-- upgrades from v0 to v1
schema-0002.script <-- upgrades from v1 to v2
schema-0003.script <-- upgrades from v2 to v3
schema-0004.script <-- upgrades from v3 to v4
...
schema-####.script
note
This method parses the version number in the file names after the dash so it's perfectly OK to include leading zero digits there (like we did in the example above). We actually recommend this so that your schema files can be sorted nicely by version when listed by the file system.
Each script file is responsible for upgrading the database from the previous version to the next. This class will help manage the upgrade process by deciding which scripts need to be executed based on the DbInfoTableName table and then executing the required scripts.
To use, construct an instance via SchemaManager, passing a database connection for the Postgres superuser or a user with the CREATEDB privilege. You'll also need to pass the database name and the path to the file system folder holding the script files.
Then call CreateDatabaseAsync() to create the database if it doesn't already exist; this uses the connection passed to the constructor. Then call UpgradeDatabaseAsync(string, int, bool, Action<bool, int>) to apply any necessary updates; this uses a new connection to the target database using the credentials from the original database connection.
You may optionally pass a string to UpgradeDatabaseAsync(string, int, bool, Action<bool, int>) that identifies the entity performing the upgrade. This could be an application name, the name of the host the updater is running on, the username of the person performing the upgrade etc. This method uses this to try to prevent multiple updgrade from happening in parallel on the same database (which would be bad) and the updater string can be used to help identify who else is updating the database. This parameter defaults to a GUID.
Most applications will include at least two scripts when they get started with schema-0.script creating the database and schema-1.script creating the tables, views, data types, stored procedures, etc.
SQL COMMAND BATCHES
It's often necessary to execute a sequence of SQL commands that depend on each other. One example is a command that creates a table followed by commands that write rows. You might think that you could achieve this by executing the following as one command:
CREATE TABLE my_table (name text);
INSERT INTO my_table (name) values ('Jack');
INSERT INTO my_table (name) values ('Jill');
But, this won't actually work because the database generates a query plan for the entire command and when it does this and sees the inserts into [my_table] but the table doesn't actually exist at the time the query plan is being created. So the command will fail.
What you really need to do is create the table first as a separate command and then do the inserts as one or more subsequent commands. This is not terribly convenient so we've introduced the concept of a batch of commands. Here's what this would look like:
CREATE TABLE my_table (name text);
GO
INSERT INTO my_table (name) values ('Jack');
INSERT INTO my_table (name) values ('Jill');
See how the GO line separates the table creation from the inserts. This method will split the script files into separate commands on any GO lines and then execute these commands in order.
note
GO is case insensitive and any leading or trailing space on the line will be ignored.
Batch commands are implemented by ExecuteBatch(NpgsqlConnection, string, NpgsqlTransaction) and an asynchonous alternative.
SCRIPT VARIABLES
Your schema scripts may include variables of the form ${NAME} where NAME is the case sensitive variable name. The variable references will be replaced by the variable's value when the variable is defined, otherwise the variable reference will be left in place.
The ${database} variable is reserved and will be replaced by the name of the database being managed. You can specify your own variables by passing a dictionary to the constructor. This can be useful for specifying things like password, replication factors, etc.
UPGRADE STRATEGIES
The current implementation assumes that applications using the database are offline or can work properly with both the new and old schema. Here are some siggestions for managing updates:
- Use YugaByte snapshots to backup the keyspace.
- Effectively take the keyspace offline during the upgrade by revoking all rights to all users besides the current one and the superuser before upgrading and then restoring these rights afterwards.
- For services and keyspaces deployed to Kubernetes, we recommend that you handle keyspace schema updates via a custom Kubernetes operator which would stop any services using the keyspace, apply the schema update, and then restart the services, potentially upgrading them as well. You could embed the schema scripts in the operator itself so upgrading the keyspace (and application) would be as simple as upgrading the operator.
HANDLING UPGRADE ERRORS
It's possible for a database upgrade to fail. Addressing upgrade failures will generally require manual intervention. You should start out by looking at the version and error columns in the DbInfoTableName in your database to diagnose what happened. version indicates the schema version before the update script was executed but that it's possible that the update script was paratially completed which means that the database may be in a state between the old and update schema version.
Here are the underlying causes for upgrade errors:
hardware/cluster |
The database cluster or the hardware/platform it's running is having problems that prevent the updates from being applied. The error column will describe the error. |
script errors |
Your upgrade scripts have syntax errors or are otherwise invalid. The error column will describe the error. |
updater conflict |
Another updater is currently running or terminated for some reason before completing the update. The updater column will identify the updater instance that is currently updating the database or that failed prematurely. |
For updater conflicts, you'll need to determine whether the identified updater is still running or whether it has failed. Simply wait for the other updater to finish if it's still running, otherwise you have a failure and will need to follow these recomendations to manually mitigate the situation:
Manual Rollback |
It's possible that some but not all of the commands in your update script have
completed. Depending on the upgrade details, you may want to manually undo any
of the statements that completed to get the database back to its state before
the the update started and then call UpgradeDatabaseAsync(string, int, bool, Action<bool, int>)
with force: true .
|
Manual Upgrade |
As an alternative to Manual Rollback, you could simply execute the remaining
update commands manually and then updating the DbInfoTableName by setting
version to the new version number and setting the updater and error
fields to NULL .
|
Script Corrections | Be sure to correct any problems with your upgrade script, even if your are going to manually complete the upgrade so that upgrades will work for new database instances. |
SCRIPTS AS EMBEDDED RESOURCES
In addition to reading SQL scripts as standard files, the SchemaManager can also read scripts from embedded resources. This is an easy and clean way to include these scripts in a program or library. Here's what you need to do:
- Create a folder in your project to hold your SQL script files.
- Add your scripts to the new folder, saving them with **UTF-8 encoding**.
- Select your script files in the Solution Explorer and then left-click on them and select **Properties**. Set **Build Action** to **Embedded resource**.
- You'll be using the SchemaManager(NpgsqlConnection, string, IStaticDirectory, Dictionary<string, string>) override constructor and you'll be passing an IStaticDirectory that emulates a read-only file system constructed from embedded resources. You'll need to call GetResourceFileSystem(Assembly, string) to obtain this directory, passing a string identifying resource name prefix that identifies your virtual folder.
Constructors
SchemaManager(NpgsqlConnection, string, IStaticDirectory, Dictionary<string, string>)
Constructs an instance that loads scripts from embedded resources.
Declaration
public SchemaManager(NpgsqlConnection masterConnection, string databaseName, IStaticDirectory schemaDirectory, Dictionary<string, string> variables = null)
Parameters
Type | Name | Description |
---|---|---|
NpgsqlConnection | masterConnection | The master database connection to be used for creating the target database. This connection must have been made for a Postgres superuser or a user with the CREATEDB privilege and must not reference a specific database. |
string | databaseName | The database name to be used. |
IStaticDirectory | schemaDirectory | The embedded resource directory returned by a call to GetResourceFileSystem(Assembly, string). |
Dictionary<string, string> | variables | Optionally specifies script variables. |
Exceptions
Type | Condition |
---|---|
FileNotFoundException | Thrown if there's no directory at scriptFolder or when there's no schema-0.script file in the directory. |
SchemaManager(NpgsqlConnection, string, string, Dictionary<string, string>)
Constructs an instance that loads scripts from files.
Declaration
public SchemaManager(NpgsqlConnection masterConnection, string databaseName, string schemaFolder, Dictionary<string, string> variables = null)
Parameters
Type | Name | Description |
---|---|---|
NpgsqlConnection | masterConnection | The master database connection to be used for creating the target database. This connection must have been made for a Postgres superuser or a user with the CREATEDB privilege and must not reference a specific database. |
string | databaseName | The database name to be used. |
string | schemaFolder | The path to the file system folder holding the database schema scripts. |
Dictionary<string, string> | variables | Optionally specifies script variables. |
Exceptions
Type | Condition |
---|---|
FileNotFoundException | Thrown if there's no directory at scriptFolder or when there's no schema-0.script file in the directory. |
Fields
DbInfoTableName
The name of the database information table.
Declaration
public const string DbInfoTableName = "__dbinfo"
Field Value
Type | Description |
---|---|
string |
Properties
TargetConnection
Returns a connection to the target database if the database exists.
Declaration
public NpgsqlConnection TargetConnection { get; }
Property Value
Type | Description |
---|---|
NpgsqlConnection |
Exceptions
Type | Condition |
---|---|
SchemaManagerException | Thrown when the database doesn't exist. |
Methods
CreateDatabaseAsync()
Creates the database using the schema-0.script file from the script folder. This also creates the DbInfoTableName table adding a row setting the Version to 0 by default.
Declaration
public Task<bool> CreateDatabaseAsync()
Returns
Type | Description |
---|---|
Task<bool> |
|
Exceptions
Type | Condition |
---|---|
FileNotFoundException | Thrown if the schema-0.script file does not exist in the script folder. |
SchemaManagerException | Thrown if the database already exists but does not include the DbInfoTableName table or if that table doesn't have exactly one row or the version there is not positive. |
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
Declaration
public void Dispose()
Dispose(bool)
Handles the actual disposal.
Declaration
protected virtual void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
bool | disposing | true if we're disposing, |
~SchemaManager()
Finalizer.
Declaration
protected ~SchemaManager()
GetStatusAsync()
Returns information about the database schema status and schema scripts.
Declaration
public Task<DatabaseStatus> GetStatusAsync()
Returns
Type | Description |
---|---|
Task<DatabaseStatus> | The DatabaseStatus. |
Exceptions
Type | Condition |
---|---|
SchemaManagerException | Thrown when the database has an invalid DbInfoTableName table. |
UpgradeDatabaseAsync(string, int, bool, Action<bool, int>)
Upgrades the database by applying any upgrade scripts from the current database
version to the latest update script found in the script folder or optionally when
the database version equals stopVersion
.
Declaration
public Task<int> UpgradeDatabaseAsync(string updaterIdentity = null, int stopVersion = -1, bool force = false, Action<bool, int> updateAction = null)
Parameters
Type | Name | Description |
---|---|---|
string | updaterIdentity | Optionally specifies the identity of the entity performing the update. This may be the username of the person doing this or something identifying the service instance for more automated scenarios. This service identity could be a hostname, container ID, or something else that makes sense. This is used to ensure that only a single entity can update the database. This defaults to a generated GUID. |
int | stopVersion | Optionally specifies the latest database update to apply. |
bool | force | Optionally specifies that any indication that another updater is in the process of updating the database will be ignored and that any pewnding updates will proceed. This may be necessary after a previous update failed. noteWARNING: You should take care to ensure that the other potential updater is not actually performing an update. This may also means that the previous update was only partially completed which could require manual intervention. |
Action<bool, int> | updateAction | Optional action that will be called before each update is applied and then afterwards.
The |
Returns
Type | Description |
---|---|
Task<int> | The version of the database after the upgrade. |
Remarks
note
IMPORTANT: This method does not perform the schema updates within a transaction because that will be impractical for large databases and also due to limitations of YugaByte Postgres. This means that you'll need to take care to ensure that your schema scripts are well tested and bulletproof and you should also consider backing up your database to be very safe.
Exceptions
Type | Condition |
---|---|
SchemaManagerException | Thrown if the database doesn't exist or does not include the DbInfoTableName table or if it invalid. |
FileNotFoundException | Thrown if the schema-0.script file does not exist or when there are any missing script files (e.g. gaps in the sequence of files) or there are scripts with unexpected file names. |
SchemaManagerException | Thrown when another entity currently is in the process of updating the database schema. |