Friday, March 11, 2011

Schema Management in Cassandra 0.7

Schema Management in Cassandra

Starting with Cassandra 0.7 the schema management in Cassandra is very easy. It is as good as centralized schema management with no SPoF . Typically schema operations involve loading schema initially, making changes to existing schema like adding CF and/or modifying existing CF attributes, and dropping schema elements like CFs and Keyspaces.

There are 3 ways these operations can be performed:

Load schema from cassandra.yaml using schematool or JMX Console: This option can be used to load schema only once. Running it twice in a cluster won't have any impact. So this is good for loading initial schema.

schematool import
OR
JConsole:MBeans->org.apache.cassandra.db->StorageService -> Operations -> loadSchemaFromYAML


Create/Modify schema using Thrift APIs: This provides high flexiibility and good for applications that wish to create/drop Keyspaces and ColumnFamilies on fly. You cannot modify existing ColumnFamilies using the APIs. Refer to Cassandra Wiki - API for details of the APIs available. Following APIs are available:
  • describe_keyspace
  • describe_keyspaces
  • system_add_column_family
  • system_drop_column_family
  • system_add_keyspace
  • system_drop_keyspace

Create/Modify schema using cassandra-cli: This is the most flexible option available. It allow practically everything that option #1 and #2 allow collectively. Following commands are supported. You can see the commands by entering "help;" command on cassandra-cli. For details of specific command type "help ;". For eg "help create keyspace;".
  • Describe keyspace
  • Show list of keyspaces
  • Add a new keyspace with the specified attribute(s) and value(s)
  • Update a keyspace with the specified attribute(s) and value(s)
  • Create a new column family with the specified attribute(s) and value(s)
  • Update a column family with the specified attribute(s) and value(s)
  • Delete a keyspace
  • Delete a column family

Under the hood

The Cassandra Wiki - Schema Updates describes the operations in good details. Following is the high level summary:

  • Cassandra uses Schema and Migrations ColumnFamily in system keyspace for maintaining schema and changes to schema respectively.
  • Schema changes done on one node are propagated on other nodes in the cluster
  • Migrations CF tracks individual changes to schema. Schema CF contains reference to the latest version in use
  • Some manual cleanup may be needed if node crashes while schema changes are being applied to the cluster
  • To avoid concurrency issues always push schema changes through one node

Examples

Dropping a Keyspace

  • Connect to cassandra-cli on a node and run drop keyspace command.

[root@rwc-sb6240-1 bin]# ./cassandra-cli
Welcome to cassandra CLI.

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] connect 20.17.221.19/9160;
Connected to: "NarenCluster072" on 20.17.221.19/9160
[default@unknown] drop keyspace KeyspaceMigration;
5fba3f1f-4a82-11e0-b8ee-f90f8a3f5e1f
[default@unknown] exit;
[root@rwc-sb6240-1 bin]#


  • The logs on the node will show following events (DEBUG MODE)

DEBUG [pool-1-thread-151] 2011-03-09 11:21:03,334 CassandraServer.java (line 759) drop_keyspace
DEBUG [MigrationStage:1] 2011-03-09 11:21:03,343 Table.java (line 397) applying mutation of row 35666261336631662d346138322d313165302d623865652d663930663861336635653166
...
DEBUG [CompactionExecutor:1] 2011-03-09 11:21:04,146 CompactionManager.java (line 109) Checking to see if compaction of Schema would be useful DEBUG [MigrationStage:1] 2011-03-09 11:21:04,146 MigrationManager.java (line 106) Announcing my schema is 5fba3f1f-4a82-11e0-b8ee-f90f8a3f5e1f
DEBUG [CompactionExecutor:1] 2011-03-09 11:21:04,147 CompactionManager.java (line 109) Checking to see if compaction of Migrations would be useful
DEBUG [ReadStage:14] 2011-03-09 11:21:04,150 MigrationManager.java (line 87) Their data definitions are old. Sending updates since d052796e-4a80-11e0-b8ee-f90f8a3f5e1f
DEBUG [ReadStage:15] 2011-03-09 11:21:04,151 MigrationManager.java (line 87) Their data definitions are old. Sending updates since d052796e-4a80-11e0-b8ee-f90f8a3f5e1f
...
DEBUG [pool-1-thread-151] 2011-03-09 11:21:05,629 StorageProxy.java (line 628) My version is 5fba3f1f-4a82-11e0-b8ee-f90f8a3f5e1f DEBUG [pool-1-thread-151] 2011-03-09 11:21:05,629 StorageProxy.java (line 659) Schemas are in agreement.


  • On the other nodes the log entries will look like

DEBUG [ReadStage:9] 2011-03-09 11:12:19,250 MigrationManager.java (line 82) My data definitions are old. Asking for updates since d052796e-4a80-11e0-b8ee-f90f8a3f5e1f
DEBUG [ReadStage:9] 2011-03-09 11:12:19,253 MigrationManager.java (line 106) Announcing my schema is d052796e-4a80-11e0-b8ee-f90f8a3f5e1f
DEBUG [MigrationStage:1] 2011-03-09 11:12:19,273 SchemaCheckVerbHandler.java (line 36) Received schema check request.
...
DEBUG [MigrationStage:1] 2011-03-09 11:12:20,681 MigrationManager.java (line 106) Announcing my schema is 5fba3f1f-4a82-11e0-b8ee-f90f8a3f5e1f

3 comments:

ओंकार (Onkar) said...

Didn't understand a thing in this post, but I hope to see more blogging from you in future. :-)

By the way, you should really mask user/machine names and IP addresses from your post.

Naren said...

>>By the way, you should really mask user/machine names and IP addresses from your post.

Point taken. Thanks for the suggestion.

小竹 said...

Would you please share a method on how you change schema in cassandra?