Blog Improve It

Databases don't scale. YET!

Posted by Marcos Tapajós over 4 years ago.

cdb2

Everyone who is following me and Vinícius knows that we were making some experiments using CouchDB. Some people were asking why couchDB? The answer is because this is a different kind of database and sometimes we have problems with the traditional ones and we never found a definitive solution.

Before working for Improve It I had worked in a project using Java, with so many web-servers, applications servers, load balance, queue and everything that it was necessary to scale an application but the performance was terrible. The system users hated them and us! Basically the problem was because the database was very big and shared with other systems.

Vinícius worked with some clients with similar problems and they spent a lot of money with DBAs and contracting companies specialized in creating clusters of databases without success or with partial success.

Experience tells me that every big system has problems scaling database. Recently, at the Rails Summit, I watched the Phusion guys speech about the difficulty to scale databases and that this was the real problem. I agree with everything that they said.

I think is very easy to scale Ruby, Java or any other language adding more machines, memcached, queue and a load balance. But it is irrelevant if your database doesn't support all the accesses.

Some people say that Ruby is slow, Rails doesn't scale and that Java is the solution to every problem of the world but they forget to say how they solve the database problem. I was very curious to know how.

Of course it is possible to do some things to "solve" these problems. If this were a problem without solution probably systems like Flickr would not exist. The question is not having a solution but having an easy solution. In my opinion the answer is not with traditional databases.

Relational database isn't a bad thing. It was developed before the internet and it was meant to solve very different problems than the actual problems and now it is time to create other solutions. CouchDB is one of these solutions bus isn't the only one. You can try others like SimpleDB.

Some people will criticise us and say that they knows solutions using Oracle, MSSQL or whatever. I would like to ask these people to think if these solutions are simple.

Because of these bad experiences we hate traditional databases and decided to try another paradigm for databases. We decided to study this database migrating one from our system to use CouchDB and only after that stating an opinion. We love trying new solutions and hate talking without knowledge. :-)

After almost a week working hard on the system, we finished the migration and we were very happy with the results! Now I will share some things that I learned this week.

CouchDB was developed using Erlang and now I know that it is more than a distribution! Erlang is a general-purpose concurrent programming language and runtime system created by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. It sounds good to me! It is everything that I wanted.

Instead of storing data in rows and columns, the CouchDB manages a collection of documents; JSON documents. I'm not sure but I think it is now supporting XML too. Well, it isn't columns oriented and without a schema. All documents in CouchDB are versioned.

The CouchDB uses views(design documents) defined with functions to aggregate datas and filters to be computed in parallel. This database is very much based in MapReduce principles. The index of the views are updated continuously.

Instead of a SQL CouchDB uses Javascript by default but you can add support for other languages.

CouchDB is accessible via a RESTful JSON API. It is very good because you can put a load balance in front of a lot of databases. For me it is great and represents part of the solution to scale databases!

cdb1

The last thing I discovered in CouchDB was how easy it was to make replications. You just need to say what the origin and destination urls are and then wait. It is really very easy and FUN. Vinícius was crazy about it when I decided to try to make some replications! I stole his notebook and put my notebook in an infinite loop to generate very big documents and started the process so many times to analyse the performance and how to work with the bi-directional replications.

I'm not an specialist in databases. I wrote about my impressions and why I chose this database to study. If you aren't happy with your database try to study it!

Tags  | 11 comments

How did you like it? Write your comments and suggestions below!

Follow up this page’s RSS.

Comments (11 up to now)

  1. luisbebop said about 9 hours later:

    Excelente artigo. Também estou fazendo alguns testes com o CouchDB e realmente estou gostando do que estou vendo, pelo projeto estar apenas na versão 0.9.

    Poderia colocar os números de performance que vc conseguiu, e descrever com mais detalhes como foi o teste ?

    Grande abraço, e parabéns!

  2. Leandro said about 10 hours later:

    Thanks for the great info! I'll check this with full pleasure! Seems to solve many things and also change the paradigm!

  3. _Felipe said about 12 hours later:

    Parece que os bancos não relacionais são a esperança de escalabilidade simples.

    Acho que o CouchDB vai dar muito o que falar ainda. Os early adopters já estão aparecendo. Pena eu não ter nenhum projeto com problemas de escalabilidade para poder testá-lo de verdade.

  4. Dirceu Jr. said about 13 hours later:

    Right now I'm using couchdb much like as persistent cache and underused the power of MapReduce. I see a lot of people on couch's list workin in semantics companies... can not wait to play with counting words and make relationships and with texts/phrases/words.

  5. Leandro Silva said about 13 hours later:

  6. Carlos Villela said about 18 hours later:

    @Leandro Silva, I would actually not recommend that version of ActiveCouch on my repository (the one you just linked to).

    I've been using http://github.com/jchris/couchrest and, while not the same API and with a slightly different feel to it, I find it more fun to use.

    @Marcos Tapajós excellent article -- what were the performance and resource usage numbers you encountered?

  7. André Faria said about 23 hours later:

    Good job! I'm going to check it out. Cheers

  8. Marcos Ricardo said 3 days later:

    Hi,

    Great job Tapajós and Vinicius !

    I would like to ask a few more details about the migration process.

    Which or what kind of system do you choose to migrate ?

    Which are the basic steps ?

    Can I start a Rails application from scratch using CoachDB now ?

  9. Chris Anderson said about 1 year later:

    Thanks for beating up on Replication! I love hearing stories about people abusing it and it working just fine. Keep em coming!

  10. The DBA said about 1 year later:

    CouchDB can be used for a Datawarehouse? Or BI solution? I dont think so.

  11. DBA said over 3 years later:

    Come on boys. These crappy databases are just toys for geeks who are not serious players. If an international company asks me for what technology to use, i would not hesitate to recommend ANOTHER tool. Real industry projects use supported relational databases like Oracle, SQL Server, etc. Those mosters are the only scalable in the world! You just need a good staff to do the right thing. I can scale anything in the world.

    Kind Regards, Kindala Saram Amir Master Senior PhD in Relational Spatial Databases, Enterprise Cluster Architect with SOA, President of the International Consultor Union of SQL Server DBAs and Vice-President of OUIG Oracle Forum Users.