Rock Out Riak

Hey everybody! We had such a blast writing the deployment series that we’re jumping directly into another one. This time around, we’re going to give an overview of several of the database options that exist today. Much like the deployment articles, these are going to be fairly high-level overviews that are meant to basically give you a taste of the topic. With luck, this will help you to decide which of these options would be best-suited for a given scenario.

First up on the list is Riak

Riak by Basho

What’s Riak?

At its core, Riak is an eventually-consistent distributed key-value data store. The whole story is a little more involved than that, granted, but even after using it for a while now, that is still how we think about it. Even that simple description can be broken down a bit.

Eventually-Consistent

Like most other NoSQL database solutions, Riak does not follow ACID guidelines for data storage. Rather than this, it uses vector clocks to determine which value is current for a given key, and the method used to determine the winner in a vector clock conflict is configurable.

Distributed

While you can run a single Riak node, it goes against the entire idea behind the database. To really use it to its full potential, one would be better suited in using a cluster of Riak nodes and configuring the number of copies of each key-value pair (as well as the quorum values) in an intelligent manner.

To give an idea, the default configuration states that no fewer than three copies of every key-value pair are to be kept in a Riak cluster. If running a single node instead, all three copies are stored on that one node. That is not only fairly inefficient in terms of storage space, but it’s also the case that having more nodes in the cluster would allow for better performance, as it would split the work for storing and retrieving a single key-value up between the nodes that have a copy.

Key-Value

At the very low level, Riak is really nothing more than a key-value data store. Another common example of this technology in practical use is a telephone directory … phone numbers are the values, and names are the keys. That is a fairly simplistic way to think about Riak, but that line of thinking can take one qutie a ways when developing an application that uses Riak. There is, however, more to think about.

For example, as a schema-less NoSQL database solution, Riak doesn’t have any strong feelings as to what the format of the value associated with a key should be. You can store binary file contents, JSON documents, YAML documents, raw text, or just about anything that can be expressed as a stream of bits. The only catch is that the name given to your value (the key) must be unique, lest you’ll overwrite the value in question.

Here’s another thought: If one stores JSON documents in the key-value database, that can be considered a document database as well. Basho realized this fairly early on, so there is a special use case such that when one stores JSON documents as the values in the system, a built-in Javascript map-reduce framework can be used for querying those documents.

Why use it?

  • Distributed By Design - While a number of databases include or have available a distribution scheme, the difference between such a system and a system like Riak, which was developed specifically with distributed computing in mind, are like night and day.

  • Data Agnostic - Save for the benefits that one can gain in storing JSON documents rather than raw data, Riak has no strong opinions as to what sorts of data one should store.

  • Incredibly Easy Administration - The last time we counted, a server can become a Riak node inside of six commands at the shell prompt. This includes the installation and configuration of the software.

  • Configuratble Difficulty - Riak is easy to use for easy things (like a phone book), and it is sometimes hard to use, but mostly for hard things (like searching a standard phone book by phone number).

  • Client Libraries - In addition to the client libraries released by the community, Basho proper develops and supports libraries for Erlang, Javascript, Java, PHP, Python, and Ruby. That covers most of the bases for the popular web programming languages …

  • REST Interface - … But if you prefer a language not on the list of supported client libraries, you can pretty easily roll your own. Riak’s client interface is a RESTful resource that can be reached via either standard HTTP or via Protocol Buffers.

  • Actively Developed - Basho is, as we write this, preparing to release Riak-1.0.0, and it doesn’t look like any steam will be lost for some time to come.

  • Developed By Friendly Folks - Over the last several months, we have been having more and more conversations with some of the folks at Basho (mostly via Twitter), and they seem to be grade-A people.

Why Not Use It?

  • Funky Implementation Language - This isn’t really a terrible gripe, and Erlang probably isn’t all that funky. We are not terribly familiar with it, however, and that gave us quite a bit of pause when we first decided to give Riak a go. Our only current regret is that, as we don’t speak Erlang, we’re not really able to do any hacking of our own on Riak.

  • Not A Relational System - While (like with most NoSQL solutions) one can fake a relational model with Riak (thanks to links, document database capability, so on), it is better to write your application with Riak in mind rather than to attempt to convince Riak to do what your inherently relational application needs.

  • Not A Single Node - There are cases, we’re sure, in which one absolutely does not need a proper distributed system. If you are looking for a single database node to serve all of your needs, you will probably be better off using a different database system.

Next Time

That’s it for our high-level discussion of Riak. Next week we’ll attempt to appease the ACID RDBMS gods with a quick talk about PostgreSQL.

  1. canvashosting posted this
Short URL for this post: http://tmblr.co/ZIrN7yA1lcRh
blog comments powered by Disqus