My experience with MongoDB

I’ve recently just fin­ished switch­ing a project from using MongoDB to PostgreSQL and I’m 100% cer­tain I’ve made the cor­rect deci­sion in doing so. Run­ning at basi­cally the same load, Post­greSQL returns queries much faster and uses much less CPU and RAM. Despite it’s pop­u­lar­ity a few years ago, Mongo really strikes me as a bit of a mess.

Just some back­ground infor­ma­tion to show you what I mean: Post­greSQL is a tra­di­tional Relational Data­base Man­age­ment System. It stores data in tables sep­a­rated into ‘rows’, each split into defined ‘column­s’. It is built on set the­o­ry, is ACID and uses SQL as the query lan­guage. Mon­goDB, on the other hand, is a NoSQL document database, which stores col­lec­tions of entries as extended JSON doc­u­ments (the extended for­mat is called BSON.) One of the chief fea­tures of Mon­goDB is that it’s dis­trib­uted and uses a con­cept know as ‘even­tual con­sis­ten­cy’ to the­o­ret­i­cally enable faster write oper­a­tions on clus­ters than is achiev­able with an RDBMS.

Mon­goDB sup­pos­edly has two pri­mary advan­tages. This first is that since it is a schema­less, NoSQL solu­tion, it makes it much sim­pler to get a data­base up and run­ning. You don’t have to spend time upfront design­ing a schema and you don’t have to cor­rect a schema it it’s bro­ken. You just start insert­ing new records with the fields you want them to have and have your appli­ca­tion han­dle the vari­a­tions. The sec­ond ben­e­fit is that because it is dis­trib­ut­ed, it should be much eas­ier scale. It can be installed on hun­dreds of machines and changes to one will prop­a­gate through the sys­tem. There need be no sin­gle point of fail­ure. While both of these seem like strong advan­tages, I’m not sure that they pan out in real­i­ty.

For starters, while a schema­less solu­tion makes early tin­ker­ing more fric­tion­less, some­times you want the checks and data pro­tec­tion that a schema can provide. Typed columns also allows rela­tional data­bases to make opti­miza­tions that Mon­goDB can’t. It seems to me that the dif­fer­ence between schema vs schema­less data­bases is almost exactly like the dif­fer­ence between sta­tic vs dynamic pro­gram­ming lan­guages. On the one end you have con­straints which help to catch some bugs early and enforce reli­a­bil­ity while at the same time giv­ing the com­pil­er/db more options to opti­mize speed and mem­ory usage, while on the other hand, you have greater flex­i­bil­ity and less bureau­cra­cy. Both have their place, but it’s worth notic­ing that parts of code that either need to be per­for­mant or reli­able are often rewrit­ten in a sta­tic lan­guage while less crit­i­cal parts are writ­ten in a dynamic lan­guage. The thing is, while no schema is nice when your still fig­ur­ing out your appli­ca­tion’s log­ic, it’s nice to have things more struc­tured by the time you deploy. You might still need to make changes, but since you’ll need to spend time think­ing about them, hav­ing to make explicit changes to a schema and run­ning migra­tions on data will no longer be as big a prob­lem.

The other issue I had with Mon­goDB was that, for my pur­pos­es, it was­n’t really per­for­mant. Mongo is really designed to work at scale. The docs sug­gest run­ning it on at least three ded­i­cated servers with ample resources each. This was a bit much for me so I ran it on a sin­gle server which it shared with the appli­ca­tion. As a result my appli­ca­tion was slow and the server crashed peri­od­i­cal­ly. Now you could crit­i­cize me for not fol­low­ing the rec­om­mended pro­ce­dure, and you’d be right, but under­stand that when I switched to Post­greSQL, with­out increas­ing the hard­ware capac­ity at all, all of my per­for­mance and sta­bil­ity prob­lems went away. Mon­goDB demanded too much for less per­for­mance and essen­tially the same queries. I could have thrown more hard­ware at Mon­go, but I could also throw the same amount of hard­ware at Post­gres and still end up with a more per­for­mant sys­tem.

The rea­son Mon­goD­B’s dis­trib­uted design exists at all is because of the notion that scal­ing out is cheaper in the long run than scal­ing up, but I don’t think that scal­ing up actu­ally that hard at all. If you need more disk space, throw­ing a SAN onto a server isn’t too much of a prob­lem. 15TB on a SAN is pretty stan­dard, and above that you’re really mov­ing into Big Data ter­ri­tory where spe­cial­ized tools are nec­es­sary any­way. Shard­ing helps to ‘dis­trib­ute’ the work­load across disks arrays so you’ll even get part of the per­for­mance ben­e­fit from using mul­ti­ple servers. Adding faster net­work access and more CPU to a sys­tem isn’t hard either. High end servers are designed to make this easy. Besides, unless your sys­tem is par­tic­u­larly write heavy rela­tional data­bases can use repli­ca­tion to scale out any­how. Mon­goD­B’s model really isn’t an advan­tage unless you are solv­ing a write-heavy, Big Data problem.1 Until you reach that scale, it’s actu­ally slower than the alter­na­tive.

Part of the rea­son Mon­goDB seems less per­for­mant, I think, goes right back to it’s schema­less design. Because fields have no types, Mongo is lim­ited in the kinds of indexes it can cre­ate on data. There are per­for­mance enhance­ments that can be made with a lookup ta­ble with only inte­gers or only strings as keys that can’t be made when you have a mix of the two. This can make things much faster and less resource inten­sive, which will improve the sta­bil­ity of the sys­tem over­all. So I feel that while a JSON (BSON) doc­u­ment store is a neat idea, I think it fails some basic util­ity tests. Bet­ter to use a sim­ple dis­trib­uted hash with seri­al­ized objects.

One final crit­i­cism that I feel says a lot about Mon­goDB is that I was sur­prised at one point to find that it was endi­an­ness depen­dent. It makes opti­miza­tions based on the bit order of the sys­tem and can­not be run on sys­tems with the wrong bit order. That means that Sparc and IBM POWER sys­tems can­not run it. This seems like pre­ma­ture opti­miza­tion to me, try­ing to make the sys­tem faster at the expense of porta­bil­i­ty.

So, in the end I don’t think I’ll be using Mon­goDB for pro­duc­tion sys­tems in the future. I’ll prob­a­bly still use it for pro­to­typ­ing though. Some­times when your still design­ing an appli­ca­tion and you’re writ­ing a pro­to­type, you don’t know what the final shape of the data will be. I think in this one case, the flex­i­bil­ity of a schema­less solu­tion will out­weigh the advan­tages of a RDBMS. I’ll still swap out the back­end before the appli­ca­tion hits pro­duc­tions, but until then Mongo is fine.

  1. Or just any very write heavy problems. 

Last update: 20/09/2013

blog comments powered by Disqus