With open source SQL databases there is no "simply" when it comes to replication...

justinsb · on March 28, 2010

A read-only database isn't a database in my book.

I agree that built-in replication can be difficult to administer even today, but you're being completely revisionist here. Replication wasn't introduced into MySQL until 2000. In 1997, you would by necessity have rolled your own replication system tailored to your needs (much simpler than solving the general-case problem). That's basically what you did anyway, but you solved it in the most trivial way possible: you 'replicated' by doing a complete database dump and re-distributing the entire DB. If you'd had a viable open-source relational database, you could have scaled the reads and got more developer productivity by distributing a SQL database (e.g. SQLLite) rather than a key-value database (BDB).

I appreciate your standing up and giving a concrete example of NoSQL usage - nobody else has been brave enough to do so. But it seems that the reasons for it were highly specific to the time: there were no viable open-source databases, Amazon was just introducing the idea of customer reviews (i.e. pre Web 2.0) so data was primarily read-only, memory was comparatively expensive and memcached didn't exist, and you had a comparatively small product catalog where complete re-generation was an option. I don't think you can carry forward the optimizations you made in that framework into today's world.

nicpottier · on March 29, 2010

See my reply to the grandparent.

I actually was responsible for that system, and moving away from BDB's being pushed to servers sometime in '00 or so.

As you said, these weren't really databases by any stretch of the imagination, simply snapshots, and built for a very specific type of query. (by asin, by time, reverse ordered)

The building of the DB's was a pain in the ass, because the sheer scale of them was so big that you had to do clean builds (instead of incrementals) fairly often without them wasting space. There was also all sorts of voodoo magic going on to work around various BDB issues.

The system did eventually move to a service architecture (as all of AMZN did), for two main reasons:

1) pushing that much data to more and more servers was getting insane, even on their inner networks.

2) we wanted faster turnaround for new reviews

3) rebuilding the BDBs was becoming more and more cumbersome with scale

All that said, the original system did take us pretty darn far, both in scalability of traffic and scalability of data, farther than most websites will ever reach.

Fun times working there, you really get to work on some unique problems.