Class SchemaManager
Manages the initial creation and schema updates for a Cassandra keyspace.
Namespace: Neon.Cassandra
Assembly: Neon.Cassandra.dll
Syntax
public class SchemaManager
Remarks
This class uses some simple conventions to make it easy to upgrade a keyspace schema over time as the data model evolves. This uses the concept of schema version numbers. A schema version is simply an integer value where the version will be 0 when a keyspace is initially created and then the version is incremented by one whenever the keyspace schema is updated.
This class uses a reserved table named DbInfoTableName that is used to keep track of the current schema version. This table will have a single row with these columns:
Version:integer | The integer keyspace schema version. This will be set to 0 when the keyspace is first created and will be incremented for each subsequent update. |
Updater:text | Used by multiple service instances to coordinate which one actually handles the update. This will be `NULL` when the keyspace isn't being updated and will be set to a string identifying the entity currently updating the keyspace. This string can be anything from a GUID, container ID, hostname, or whatever. |
UpdateStartUtc:timestamp | Time (UTC) when the most recent update was started. |
UpdateFinishUtc:timestamp | Time (UTC) when the most recent update was completed. This will be `NULL` while an update is in progress. |
You'll be authoring Postgres SQL script files to create the initial keyspace as well as to upgrade the keyspace for each subsequent schema change. By convention, this class assumes that the SQL scripts for each keyspace will be saved to separate folders with each script file named like: schema-#.script where # is the schema version the script will upgrade the keyspace to, with schema-0.script being the script that creates the keyspace as Version 0. So your script folder will look something like:
schema-0000.script <-- keyspace created as v0
schema-0001.script <-- upgrades from v0 to v1
schema-0002.script <-- upgrades from v1 to v2
schema-0003.script <-- upgrades from v2 to v3
schema-0004.script <-- upgrades from v3 to v4
...
schema-####.script
note
This method parses the version number in the file names after the dash so it's perfectly OK to include leading zero digits there (like we did in the example above). We actually recommend this so that your schema files can be sorted nicely by version when listed by the file system.
Each script file is responsible for upgrading the keyspace from the previous version to the next. This class will help manage the upgrade process by deciding which scripts need to be executed based on the DbInfoTableName table and then executing the required scripts.
To use, construct an instance via SchemaManager, passing a keyspace session for the Postgres superuser or a user with the CREATEDB privilege. You'll also need to pass the keyspace name and the path to the file system folder holding the script files.
Then call CreateKeyspaceAsync() to create the keyspace if it doesn't already exist; this uses the session passed to the constructor. Then call UpgradeKeyspaceAsync(string, int, bool, Action<bool, int>) to apply any necessary updates; this sets the session to default to the target keyspace while the upgrade scripts are being executed.
You may optionally pass a string to UpgradeKeyspaceAsync(string, int, bool, Action<bool, int>) that identifies the entity performing the upgrade. This could be an application name, the name of the host the updater is running on, the username of the person performing the upgrade etc. This method uses this to try to prevent multiple updgrade from happening in parallel on the same keyspace (which would be bad) and the updater string can be used to help identify who else is updating the keyspace. This parameter defaults to a GUID.
Most applications will include at least two scripts when they get started with schema-0.script creating the keyspace and schema-1.script creating the tables, views, data types, stored procedures, etc.
SQL COMMAND BATCHES
It's often necessary to execute a sequence of SQL commands that depend on each other. One example is a command that creates a table followed by commands that write rows. You might think that you could achieve this by executing the following as one command:
CREATE TABLE my_table (name text);
INSERT INTO my_table (name) values ('Jack');
INSERT INTO my_table (name) values ('Jill');
But, this won't actually work because the keyspace generates a query plan for the entire command and when it does this and sees the inserts into [my_table] but the table doesn't actually exist at the time the query plan is being created. So the command will fail.
What you really need to do is create the table first as a separate command and then do the inserts as one or more subsequent commands. This is not terribly convenient so we've introduced the concept of a batch of commands. Here's what this would look like:
CREATE TABLE my_table (name text);
go
INSERT INTO my_table (name) values ('Jack');
INSERT INTO my_table (name) values ('Jill');
See how the go line separates the table creation from the inserts. This method will split the script files into separate commands on any go lines and then execute these commands in order.
note
go is case insensitive and any leading or trailing space on the line will be ignored.
Batch commands are implemented by ExecuteBatch(ISession, string) and an asynchonous alternative.
SCRIPT VARIABLES
Your schema scripts may include variables of the form ${NAME} where NAME is the case sensitive variable name. The variable references will be replaced by the variable's value when the variable is defined, otherwise the variable reference will be left in place.
The ${keyspace} variable is reserved and will be replaced by the name of the Cassandra keyspace being managed. You can specify your own variables by passing a dictionary to the constructor. This can be useful for specifying things like password, replication factors, etc.
UPGRADE STRATEGIES
The current implementation assumes that applications using the database are offline or can work properly with both the new and old schema. Here are some siggestions for managing updates:
- Use YugaByte snapshots to backup the keyspace.
- Effectively take the keyspace offline during the upgrade by revoking all rights to all users besides the current one and the superuser before upgrading and then restoring these rights afterwards.
- For services and keyspaces deployed to Kubernetes, we recommend that you handle keyspace schema updates via a custom Kubernetes operator which would stop any services using the keyspace, apply the schema update, and then restart the services, potentially upgrading them as well. You could embed the schema scripts in the operator itself so upgrading the keyspace (and application) would be as simple as upgrading the operator.
HANDLING UPGRADE ERRORS
It's possible for a database upgrade to fail. Addressing upgrade failures will generally require manual intervention. You should start out by looking at the version and error columns in the DbInfoTableName in your database to diagnose what happened. version indicates the schema version before the update script was executed but that it's possible that the update script was paratially completed which means that the database may be in a state between the old and update schema version.
Here are the underlying causes for upgrade errors:
hardware/cluster |
The database cluster or the hardware/platform it's running is having problems that prevent the updates from being applied. The error column will describe the error. |
script errors |
Your upgrade scripts have syntax errors or are otherwise invalid. The error column will describe the error. |
updater conflict |
Another updater is currently running or terminated for some reason before completing the update. The updater column will identify the updater instance that is currently updating the database or that failed prematurely. |
For updater conflicts, you'll need to determine whether the identified updater is still running or whether it has failed. Simply wait for the other updater to finish if it's still running, otherwise you have a failure and will need to follow these recomendations to manually mitigate the situation:
Manual Rollback |
It's possible that some but not all of the commands in your update script have
completed. Depending on the upgrade details, you may want to manually undo any
of the statements that completed to get the database back to its state before
the the update started and then call UpgradeKeyspaceAsync(string, int, bool, Action<bool, int>)
with force: true .
|
Manual Upgrade |
As an alternative to Manual Rollback, you could simply execute the remaining
update commands manually and then updating the DbInfoTableName by setting
version to the new version number and setting the updater and error
fields to NULL .
|
Script Corrections | Be sure to correct any problems with your upgrade script, even if your are going to manually complete the upgrade so that upgrades will work for new database instances. |
SCRIPTS AS EMBEDDED RESOURCES
In addition to reading SQL scripts as standard files, the SchemaManager can also read scripts from embedded resources. This is an easy and clean way to include these scripts in a program or library. Here's what you need to do:
- Create a folder in your project to hold your SQL script files.
- Add your scripts to the new folder, saving them with **UTF-8 encoding**.
- Select your script files in the Solution Explorer and then left-click on them and select **Properties**. Set **Build Action** to **Embedded resource**.
- You'll be using the SchemaManager(ISession, string, IStaticDirectory, Dictionary<string, string>) override constructor and you'll be passing an IStaticDirectory that emulates a read-only file system constructed from embedded resources. You'll need to call GetResourceFileSystem(Assembly, string) to obtain this directory, passing a string identifying resource name prefix that identifies your virtual folder.
Constructors
SchemaManager(ISession, string, IStaticDirectory, Dictionary<string, string>)
Constructs an instance that loads scripts from embedded resources.
Declaration
public SchemaManager(ISession session, string keyspace, IStaticDirectory schemaDirectory, Dictionary<string, string> variables = null)
Parameters
Type | Name | Description |
---|---|---|
ISession | session | The master keyspace session to be used for creating the target keyspace. This session must have been made for a Postgres superuser or a user with global CREATE permission. |
string | keyspace | The keyspace name to be used. |
IStaticDirectory | schemaDirectory | The embedded resource directory returned by a call to GetResourceFileSystem(Assembly, string). |
Dictionary<string, string> | variables | Optionally specifies script variables. |
Exceptions
Type | Condition |
---|---|
FileNotFoundException | Thrown if there's no directory at scriptFolder or when there's no schema-0.script file in the directory. |
SchemaManager(ISession, string, string, Dictionary<string, string>)
Constructs an instance that loads scripts from files.
Declaration
public SchemaManager(ISession session, string keyspace, string schemaFolder, Dictionary<string, string> variables = null)
Parameters
Type | Name | Description |
---|---|---|
ISession | session | The master keyspace session to be used for creating the target keyspace. This session must have been made for a Postgres superuser or a user with global CREATE permission. |
string | keyspace | The keyspace name to be used. |
string | schemaFolder | The path to the file system folder holding the keyspace schema scripts. |
Dictionary<string, string> | variables | Optionally specifies script variables. |
Exceptions
Type | Condition |
---|---|
FileNotFoundException | Thrown if there's no directory at scriptFolder or when there's no schema-0.script file in the directory. |
Fields
DbInfoTableName
The name of the keyspace information table.
Declaration
public const string DbInfoTableName = "__dbinfo"
Field Value
Type | Description |
---|---|
string |
Methods
CreateKeyspaceAsync()
Creates the keyspace using the schema-0.script file from the script folder. This also creates the DbInfoTableName table adds a row setting the Version to 0.
Declaration
public Task<bool> CreateKeyspaceAsync()
Returns
Type | Description |
---|---|
Task<bool> |
|
Exceptions
Type | Condition |
---|---|
FileNotFoundException | Thrown if the schema-0.script file does not exist in the script folder. |
SchemaManagerException | Thrown if the keyspace already exists but does not include the DbInfoTableName table or if that table doesn't have exactly one row or the version there is not positive. |
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
Declaration
public void Dispose()
Dispose(bool)
Handles the actual disposal.
Declaration
protected virtual void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
bool | disposing | true if we're disposing, |
~SchemaManager()
Finalizer.
Declaration
protected ~SchemaManager()
GetStatusAsync()
Returns information about the keyspace schema status and schema scripts.
Declaration
public Task<KeyspaceStatus> GetStatusAsync()
Returns
Type | Description |
---|---|
Task<KeyspaceStatus> | The KeyspaceStatus. |
Exceptions
Type | Condition |
---|---|
SchemaManagerException | Thrown when the keyspace has an invalid DbInfoTableName table. |
UpgradeKeyspaceAsync(string, int, bool, Action<bool, int>)
Upgrades the keyspace by applying any upgrade scripts from the current keyspace
version to the latest update script found in the script folder or optionally when
the keyspace version equals stopVersion
.
Declaration
public Task<int> UpgradeKeyspaceAsync(string updaterIdentity = null, int stopVersion = -1, bool force = false, Action<bool, int> updateAction = null)
Parameters
Type | Name | Description |
---|---|---|
string | updaterIdentity | Optionally specifies the identity of the entity performing the update. This may be the username of the person doing this or something identifying the service instance for more automated scenarios. This service identity could be a hostname, container ID, or something else that makes sense. This is used to ensure that only a single entity can update the keyspace. This defaults to a generated GUID. |
int | stopVersion | Optionally specifies the latest keyspace update to apply. |
bool | force | Optionally specifies that any indication that another updater is in the process of updating the keyspace will be ignored and that any pewnding updates will proceed. This may be necessary after a previous update failed. noteWARNING: You should take care to ensure that the other potential updater is not actually performing an update. This may also means that the previous update was only partially completed which could require manual intervention. |
Action<bool, int> | updateAction | Optional action that will be called before each update is applied and then afterwards.
The |
Returns
Type | Description |
---|---|
Task<int> | The version of the keyspace after the upgrade. |
Remarks
note
IMPORTANT: This method does not perform the schema updates within a transaction because that will be impractical for large keyspaces and also due to limitations of YugaByte Postgres. This means that you'll need to take care to ensure that your schema scripts are well tested and bulletproof and you should also consider backing up your keyspace to be very safe.
Exceptions
Type | Condition |
---|---|
SchemaManagerException | Thrown if the keyspace doesn't exist or does not include the DbInfoTableName table or if it invalid. |
FileNotFoundException | Thrown if the schema-0.script file does not exist or when there are any missing script files (e.g. gaps in the sequence of files) or there are scripts with unexpected file names. |
SchemaManagerException | Thrown when another entity currently is in the process of updating the keyspace schema. |