Sunday 15 September 2013

Cassandra as distributed cache?

NoSQL was born as a sort of reaction to the architectural design pattern in which you put a cache (such as Ehcache Enteprise, Redis, or memcached) in front of a relational database in order to better scale the database. One of the basic rationales for NoSQL is that, if a cache is sufficient to handle most of your database queries, then you don't really need a relational database. NoSQL then goes one step further and says that, if you can live without some of the relational database features, then you can trade them off for other useful capabilities like replication.

At the moment, the company which I work for is having trouble with the caching solution we are using in front of our relational database.  I don't want to name the solution we are using, because we are not using it properly and the problems we are having are therefore more of our own making. However, we are looking at moving much of our infrastructure into an IAAS cloud solution (possibly Amazon AWS, Google Compute Engine or Rackspace).  Our existing caching solution is not well suited to multi-datacentre deployment (which is probably one of the big advantages to using IAAS), so we need to look for something else.

Cassandra is really well suited to this type of cloud deployment for a number of reasons. The Cassandra data model can easily support a key-value store (we will talk more about the Cassandra data model later) and it is possible to put time-to-live (ttl) values on Cassandra columns, which means we can have cached values automatically expire. One big advantage of Cassandra over some key-value stores is that it can flexibly shard and replicate the data to multiple nodes and multiple data centres.

The multi data centre support is very useful. Cloud providers generally allow you to deploy to n data centres, where n is larger than two. You can get really good fault tolerance by dividing your infrastructure into n separate and autonomous units (that I like to call "pods"), putting each one into a separate data centre and then doing load balancing between them (most IAAS providers give you a way to do the load balancing fairly painlessly).  This is a pretty powerful idea because you can potentially run on cheaper, smaller cloud instances and you don't need to effectively double your infrastructure like you often do when you deploy to two data centres. Assuming you have n pods, you can probably size your instances so that your applications can run using (n-2) pods. Assuming you can get n > 6, you will likely spend less than you would by spreading your infrastructure over 2 data centres which requires that you have enough infrastructure in each data centre to run in the absence of the other data centre.

As hinted earlier, Cassandra has the concept of data centres, and makes it easy to put at least one complete copy of your data in each. My thinking is that each pod should be configured as a single Cassandra data centre.  I'm not sure whether it makes sense to have more than one copy of the cached data in each Cassandra pod, because if you have six pods, you will potentially have six copies of your data, which is plenty.  Assuming there is reasonable connectivity between the pods, a Cassandra node failure will cause at least some of the data to be fetched from a different pod, which may be ok.

When cached data is updated in Cassandra, it will be replicated within a few milliseconds to the other pods. There is a risk of nodes in other pods getting stale cached data, which needs to be considered. Typically, I suspect that we will want to make user sessions somewhat sticky to the pod that they initially connect to, which should lower the risk a bit.

Another issue I can see, based on my organization's use of distributed database caches, is that we will sometimes need to invalidate a cache (remove all its entries). I can think of quite a few Cassandra data models that would allow you to invalidate a particular cache, but perhaps it is simpler if we keep each cache in its own column family. We could then drop and recreate the column family to clear or invalidate the cache.  I guess we could also just truncate the column family, but my experience with the nodetool truncate command is that it does not work really well on multi-node clusters (it works pretty well on single-node clusters though, but I am sure most people don't have those in production).

Most distributed caches also allow you to place an upper limit on the number of items in a cache.  This is generally done to conserve memory.  In Cassandra, the cache can spill to disk, so memory is less of a concern. However, it might still be desirable to have a limit on the cache size. One way to do this is to have a row (called an "all_keys" row, probably using the row key "all_keys") in each cache's column family whose column keys are a time stamp (representing cache insertion time) concatenated with the cache key for each entry in the cache. These columns would have the same time to live (ttl) as the cached data.  We could also define a counter column in each cache's column family which would keep track of the current number of elements in the cache. When this counter exceeds a certain value, we could have a daemon delete the oldest entries from the cache's column family.  These could be determined by doing a column slice on the all_keys row. Having the "all_keys" row would allow us to invalidate the cache by doing a column slice to get all the cache keys and then deleting all the rows, instead of dropping and recreating the column family.

No comments:

Post a Comment