At its first re:Invent conference in Late November, Amazon announced Redshift, a new managed service for data warehousing. Amazon also offered details and customer examples that made AWS’ steady inroads toward enterprise, mainstream application acceptance very visible.
Redshift is made available via MPP nodes of 2TB (XL) or 16TB (8XL), running the Paraccel PADB high-performance columnar, compressed DBMS, scaling to 100 8XL nodes, or 1.6PB of compressed data. XL nodes have 2 virtual cores, with 15GB of memory, while 8XL nodes have 16 virtual cores and 120 GB of memory and operate on 10Gigabit Ethernet.
Reserved pricing (the more likely scenario, involving a commitment of 1 year or 3 years) is set at “under $1000 per TB per year” for a 3 year commitment, combining upfront and hourly charges. Continuous, automated backup for up to 100% of the provisioned storage is free. Amazon does not charge for data transfer into or out of the data clusters. Network connections, of course, are not free – see Doug Henschen’s Information Week story for details.
This is a dramatic thrust in pricing, but it does not come without giving up some things. For example, Amazon has not licensed Paraccel’s high-speed data import utilities; it is far more focused at this point at enabling movement between its own Elastic MapReduce, DynamoDB and S3 storage and Redshift. Thus the early focus, and likely early adoption, is Amazon’s customers’ data already in the cloud. Movement from existing data warehouses will come later. Today, that would require exporting data into S3 and then moving it into a (designed) Redshift data warehouse using Amazon’s data movement utilities, which were not shown in detail. Design doesn’t disappear, and it’s not free. As my colleague Mark Beyer said in an email discussion:
Data warehouse and analytics expertise is harder to come by than many believe. With Amazon Redshift providing services to initiate and operate the data warehouse in lieu of Paraccel’s management interface and tools, it is left up to the Redshift implementer to “provide the data warehouse chops.” While I’m sure that any good Cloud application jockey knows their stuff, any data warehouse veteran on the planet knows that letting the apps guys write analytics is like asking your doctor to be the striker on your football team (what we call Soccer here). It is entirely likely that an entire cottage industry of “expert implementers for analytics in the Cloud” will appear on the near horizon.
It’s also not clear how much database (not deployment and operating) control will be made available. Paraccel offers plenty of knobs and buttons. Tweaking performance by configuring memory, pinning tables there, looking at how data is packed inside the “slices” – it does not appear any of that will be exposed in the Redshift version. Nor is it obvious how to build ongoing update for a Redshift data warehouse yet.
Another missing “feature” is the support model one gets from a software firm like Paraccel – the level and nature of support in an Amazon environment today is quite different. Still, this is a work in progress. It was evident at re:Invent that Amazon is building up and enhancing its enterprise-facing team, and I had an interesting conversation with them about how the engagement model for an enterprise that has had several individuals “unofficially” contracting for projects on their own transitions to a corporate model. They have seen this play out a number of times now, and it’s becoming a better understood play for them.
One final comment about the vendors’ relationship: it is not as close as I suspect Paraccel would have hoped. After a million dollar multi- investment and over a year of joint work, it was surprising not to see Paraccel’s CEO on stage for the announcement, or even a synchronized press release. This reflects the relative arm’s-length nature of this arrangement. In my subsequent conversations with them, it became clear that Amazon expects their offering to diverge from Paraccel’s over time as they add their own pieces around the part they have licensed for use in Redshift. And there was no publicized joint marketing or sales initiative.
It remains to be seen if the whole elasticity value proposition (scale up, scale down) proves as relevant to data marts and data warehouses as it does to the apps that Amazon is more accustomed to hosting, or how quickly enterprises will move their data to a public cloud. Warehouses don’t scale down. But analytic platforms iused for experimenting will, and this may create a great opportunity for Amazon. Gartner clients can see our position on other dimensions of this announcement in a First Take Mark Beyer and I just published.