Databases don't scale. YET!
Posted by Marcos Tapajós over 4 years ago.
Everyone who is following me and Vinícius knows that we were making some experiments using CouchDB. Some people were asking why couchDB? The answer is because this is a different kind of database and sometimes we have problems with the traditional ones and we never found a definitive solution.
Before working for Improve It I had worked in a project using Java, with so many web-servers, applications servers, load balance, queue and everything that it was necessary to scale an application but the performance was terrible. The system users hated them and us! Basically the problem was because the database was very big and shared with other systems.
Vinícius worked with some clients with similar problems and they spent a lot of money with DBAs and contracting companies specialized in creating clusters of databases without success or with partial success.
Experience tells me that every big system has problems scaling database. Recently, at the Rails Summit, I watched the Phusion guys speech about the difficulty to scale databases and that this was the real problem. I agree with everything that they said.
I think is very easy to scale Ruby, Java or any other language adding more machines, memcached, queue and a load balance. But it is irrelevant if your database doesn't support all the accesses.
Some people say that Ruby is slow, Rails doesn't scale and that Java is the solution to every problem of the world but they forget to say how they solve the database problem. I was very curious to know how.
Of course it is possible to do some things to "solve" these problems. If this were a problem without solution probably systems like Flickr would not exist. The question is not having a solution but having an easy solution. In my opinion the answer is not with traditional databases.
Relational database isn't a bad thing. It was developed before the internet and it was meant to solve very different problems than the actual problems and now it is time to create other solutions. CouchDB is one of these solutions bus isn't the only one. You can try others like SimpleDB.
Some people will criticise us and say that they knows solutions using Oracle, MSSQL or whatever. I would like to ask these people to think if these solutions are simple.
Because of these bad experiences we hate traditional databases and decided to try another paradigm for databases. We decided to study this database migrating one from our system to use CouchDB and only after that stating an opinion. We love trying new solutions and hate talking without knowledge. :-)
After almost a week working hard on the system, we finished the migration and we were very happy with the results! Now I will share some things that I learned this week.
CouchDB was developed using Erlang and now I know that it is more than a distribution! Erlang is a general-purpose concurrent programming language and runtime system created by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. It sounds good to me! It is everything that I wanted.
Instead of storing data in rows and columns, the CouchDB manages a collection of documents; JSON documents. I'm not sure but I think it is now supporting XML too. Well, it isn't columns oriented and without a schema. All documents in CouchDB are versioned.
The CouchDB uses views(design documents) defined with functions to aggregate datas and filters to be computed in parallel. This database is very much based in MapReduce principles. The index of the views are updated continuously.
CouchDB is accessible via a RESTful JSON API. It is very good because you can put a load balance in front of a lot of databases. For me it is great and represents part of the solution to scale databases!
The last thing I discovered in CouchDB was how easy it was to make replications. You just need to say what the origin and destination urls are and then wait. It is really very easy and FUN. Vinícius was crazy about it when I decided to try to make some replications! I stole his notebook and put my notebook in an infinite loop to generate very big documents and started the process so many times to analyse the performance and how to work with the bi-directional replications.
I'm not an specialist in databases. I wrote about my impressions and why I chose this database to study. If you aren't happy with your database try to study it!