There’s a rumour that Google will be launching access to their distributed database service, BigTable, making the same type of database hardware and software that keeps Google running, available to anyone who pays for it.
Amazon led the field in making their computing infrastructure available to third-parties back in 2006, with Amazon Web Services (AWS) like Amazon Simple Storage Service (S3), and the Amazon Elastic Compute Cloud (EC2).
S3 and EC2 enabled people to have their applications running on hardware that cost hundreds of millions of dollars, simply paying for the computing time, bandwidth and storage that is used, without the laying out for the need to pay for the kit upfront.
Back in December 2007, Amazon built on AWS by putting SimpleDB into beta – offering database services on the same simple pay-for-what-you-need basis.
Google’s BigTable appears to go up against offerings like Amazon’s SimpleDB.
We think that Google has a significant advantage over Amazon. While the amount of data that Amazon oversees is admirable, it’s nothing like the scale that Google has. By example, back in 2006 they had 800 terabytes (800,000 gigabytes) in their crawl database! It’s reported that at the start of 2008, they were processing over 20 petabtyes a day – that 20,000 terabytes!
It’s built on the Google File System (GFS), designed to be completely scalable and redundant, enabling a constant uptime no matter what hardware might fail.
For a full background, it’s well worth watching the presentation about BigTable made by Jeff Dean of Google back in October 2005, when Google was using it to store close to a Petabyte of data on over 2,000 machines.
At about 53 minutes in there a slide with “BigTable as a service?” at the bottom.
A couple of issues are brought up there — resource fairness, performance isolation, prioritisation, etc. across different clients — which we guess must have been sorted out in the 2.5 years in between.