One of our senior developers/architects had a closer look and thought that it was worth doing a POC. We decided to try and use Cassandra for our election night postal code lookup application. This is a feature on our website in which a user types in their postal code (or a prefix of it) and the website returns the electoral district(s) for the postal code or prefix, possibly with live results.
In retrospect, this wasn't the absolute best use for Cassandra and we implemented it in a very simple way: essentially we used Cassandra as a key-value store, in which the keys were postal codes or postal code prefixes and the values were the electoral district number. More specifically, we created a Cassandra column family (which is the analogue to a table in a relational database) whose row keys were postal codes or prefixes and in which each row contained a single column with a text blob that had a list of electoral districts.
We probably should have put all the postal codes into columns in a single row and used column slices, which are essentially range queries on the columns in a single row (more on this in later blog entries). Since Cassandra supports 2 billion columns per row, this would have worked just fine and would have saved the need to handle postal code prefixes specially, because we could handle a prefix by searching for a range of postal codes. The Cassandra low level data architecture can be a little difficult to grasp at first and choosing a somewhat inelegant solution was a consequence of this.
We weren't enormously impressed by Cassandra at this point, however it did appear it might be useful as key-value store. We became more impressed when we started doing load testing, which is extremely important as we tend to get very heavy traffic on election nights. When our QA team started doing their load test, the people watching the Cassandra servers saw almost no load on our two node Cassandra cluster and thought that something had delayed the start of the test. The QA team subsequently increased the intensity of their load test and continued to increase it, until the load testing software failed. We couldn't help but be impressed by the fact that our two-node Cassandra cluster could withstand more load than our load testing arrangement could generate.
We ran our Cassandra POC in production on election night and it performed well. As as result, we decided to look into Cassandra in more detail and try using it for something else. The next application was live stock quotes, a fairly classic time series problem for which we had shown our relational database to be poorly suited. More on that in the next blog entry.