Photo by John Zablocki
Some of the leading evangelists in the NoSQL community swung through Portland last week to show off the latest release of their fabulous open source caching and database platform, Couchbase Server. You may be asking yourself what advantages Couchbase or any other non-relational database could provide over a plain-ol' mysql or postgres install. It's a question we have been asking at Planet Argon recently as we consider ways keep our clients' apps running as fast as possible.
First off, when talking about Couchbase and NoSql, start throwing your assumptions out the window. It is far-removed from the traditional approach to data-persistence and is not simply a layer built on top of it (a la ActiveRecord). These technologies are built with speed foremost in mind, and they make sacrifices that traditional RDBMS systems do not in order to get there. When evaluating the use of Couchbase on existing and new applications, there are a series of questions we have been asking ourselves here:
How related is the data in your app?
Some apps have a naturally shallow data structure and these are often the best candidates for the NoSql treatment. For instance, our in-house collaboration tool brainstormr has a total of 2 database tables driving it: questions and responses. A response is related to one question. A question is related to many responses. Very simple. We have other applications here with many, many more tables, and if I were draw how these entitles were related to one another, it would look like several overlaid Christmas trees. Though this point was not made explicit by the technologists who visited last week, the fact that the apps which the Couchbase folks presented more resembled brainstormr than a modern eCommerce system left some questions in the air.
What are your throughput requirements?
I have no doubt that Couchbase handles concurrency and throughput very well. The types of apps that I see most often benefitting from these technologies continuously serve large volumes of data to large volumes of people. Think ad-serving, real-time communication, online-gaming, domains where the accuracy requirements of every data-point may be eased for the sake of speed. Which leads to ...
What is your threshold for error?
An important facet of Couchbase is its approach to data-persistence. Create and update operations do not immediately add/update that data on disk. This is where we really start to branch away from pure transactional SQL. Couchbase will immediately update the cache and then add the operation to a "write queue" for disk. Your app is losing a guarantee here that once its request is complete, the corresponding mutation has made its way to disk. Though measures exist to account for failover (cache replicas with their own write-queues, observers which force persistence and make writes synchronous), a mutation not making its way to disk, given a particular set of failures, seems possible. This is an important distinction because given a disk failure you can typically recover data, not so with a RAM cache going sour. Certain classes of applications may do just fine under these constraints (communication, raw data serving, etc) but others (banking, commerce) might not. This could be a case of "No one's done it yet so it's not possible", but it's a point that you walk away from the NoSQL discussion with.
What kind of analytics do you need?
There is nothing in Couchbase that resembles SQL; its analogous querying technology is referred to as a view, which is essentially a pre-defined map/reduce function which you write to act as a query. Views are one of newest additions to the Couchbase stack and the presenters hinted that they are still evolving. After an hour or so of writing Couchbase views, I came away with the sense I could express more complex analytics more intuitively and concisely with SQL, and the mandatory intermediary step between writing a view and using that view (referred to as 'publishing') felt a bit constraining. I want to be able to shape data as I am working with it, from the console if necessary, and this seemed like I was working through a layer of indirection. Apparently ad-hoc views will be coming in vNext. Another hindrance is that one's views only aggregates data from disk. Couchbase, with its cache-first writes, may exclude expected results if your write-queue is lagging behind.
The Couchbase folks are understandably excited with how far their technology has come: it will help push many applications to meet their scaling needs. Relational systems are more proven, however, offer a richer set of tools, and can be quite fast when tuned appropriately. Neither technology is going anywhere, and developers should be familiar with the strengths of each.
You can follow the Couchbase guys Jaitla, Tugduall and John, read-up on use cases at couchbase.com and check out some great video footage from the last CouchConf.