by Merv Adrian | December 18, 2014 | 3 Comments
Donald Feinberg (@Brazingo) & Merv Adrian (@merv)
Every so often, there’s a wave of interest in the “imminent retirement” of one or more legacy database management systems (DBMS). Usually, it’s because someone with very little knowledge of the actual use and distribution of the products becomes enthusiastic about someone’s sales pitch, or an anecdote or two. Sometimes it’s the result of a “replacement” marketing campaign by a competitor. It takes longer than 40 years for DBMS technology to die, and for a (competing) marketer, it’s like the villain in a horror story who just keeps coming back. And so far, it’s usually as illusive- and as far off – as the “death of the mainframe”.
Recently, a financial analyst report stated that in 2015, the industry would begin retiring Sybase products (owned now by SAP) and Informix (owned now by IBM). We and our colleagues have since had several inquiries about this and our response is simple: poppycock. DBMS market data, and our thousands of interactions with customers, do not support any such assertions.
Let’s start with Sybase, or specifically, SAP ASE and SAP IQ, acquired by SAP from Sybase in 2010. (Full disclosure: Merv worked at Sybase in the 1990s.)
Since its acquisition of Sybase, SAP has released several enhanced versions of both SAP ASE and SAP IQ (including recently in 2014), and there’s no reason to question its intent to continue development and support of both.
Generally, the customers using these products are happy, and are not looking to replace them. We receive a steady stream inquires from Gartner clients asking about them, which have not changed in character or in volume. It is true that customers ask the question, however the vendor’s intent is not questioned. They are not typically or disproportionally about removing these products, though we regularly get inquiries about replacing all the “legacy” RDBMS offerings with new products.
SAP IQ is the oldest and most widely installed column-store DBMS on the market. It is used for both analytics and as a general purpose data warehouse; it’s also part of the SAP HANA infrastructure, used as a near-line storage engine for cooler data not required in-memory in SAP HANA.
SAP ASE has retained a sizable loyal customer base on Wall Street, where it is part of the infrastructure used for trading systems, and elsewhere. It’s been certified as a DBMS platform for SAP Applications for about two years, and its use there is growing: Gartner estimates over 6000 instances of SAP Applications using SAP ASE as a platform at the beginning of 2014. [Edited Dec 19 to change number to 6000 - see below for comment from SAP]. That rate of growth for SAP ASE is actually faster than it had been in the 10 years before SAP acquired it – most likely because now SAP ASE is an alternative to Oracle, as a platform for SAP Applications.
Given the SAP sales force’s focus on SAP HANA, and the minimal marketing of SAP ASE and SAP IQ, we do understand how a misconception around the future of these products could happen. But it is just that – a misconception.
What about Informix, acquired by IBM in 2001? Over a decade later, it remains an integral part of an IBM information management portfolio that includes three primary DBMSs – DB2, IMS and Informix, and newer entrants such as Cloudant. IBM has continued to release new enhanced versions of Informix since the acquisition; for example it has recently added JSON support with MongoDB JSON Drivers. Due to the implementation of embedded indexes, Informix is a good choice for audio and video indexing. Finally, the number of IBM Informix customers has continued to increase and its user base is very loyal, with one the largest and most active User Groups.
IBM positions Informix for three primary use cases:
- High-speed processing in verticals like retail (point of sale systems) and manufacturing
- Time-series DBMS – one of its primary features, and a “timely” one
- The Internet of Things, where its high-speed ingest capabilities and small footprint are well-suited
So, it’s our opinion that the report referenced above is erroneous, and is not based in fact. At the end of the day, one of the most powerful forces in DBMS is inertia. Just ask Oracle, whose 2Q15 financial results press release on 17-Dec-2014 noted that “software updates and product support revenues drove nearly half of total company revenue.” Legacies are sticky – if it works, people don’t take lightly to changing it. In all these cases, legacy products are not only holding their own, but finding new markets in the hands of large companies with loyal customer bases.
Don’t believe everything you read (unless, of course, we wrote it.)
Category: DBMS Gartner IBM Informix SAP Tags: Gartner, IBM, SAP, Sybase
by Merv Adrian | December 5, 2014 | 3 Comments
How have Hadoop deployments grown this year? Slowly.
Here’s a little anecdata for you:
During 2014, my colleague Nick Heudecker and I conducted quarterly webinars on the State of Hadoop, and in the Q2, Q3 and Q4 sessions we asked our (steadily growing) audience about their deployments via online polls. These results should not be considered definitive (they’re unqualified – though attendees do have to jump through a hoop or two to attend, we don’t keep extensive firmographics, titles, etc.)
What we saw was a decrease in the percentage of respondents who said they had not deployed at all by yearend. As for the other categories, which asked how many nodes were deployed, only one showed much change – the “fewer than 10″ group grew to 27% at yearend – a 50% growth over the Q2 result of 18%. This suggests a growing number of pilots – which accords with our expectations. The other groups were essentially flat, suggesting no dramatic growth in substantial projects undertaken so far, or substantial additional projects being added to the same cluster and driving growth.
Percentage of webinar respondents reporting cluster sizes, 2014
Nick and I expect to continue these webinars. See you next year, and we’ll see how things have progressed.
Category: Apache Gartner Hadoop Tags: Apache, Gartner, Hadoop
by Merv Adrian | November 17, 2014 | 9 Comments
Last week, many observers were surprised when Hortonworks’ S-1 for an initial public offering (IPO) was filed. And there are good reasons to be surprised. Why now? CEO Rob Bearden told VentureWire not long ago that he expected to exit 2014 “at a strong $100 million run rate” in preparation for a 2015 IPO. What changed? Perhaps one answer to that question might be answered by asking another question: for whom?
Is the filing is for Hortonworks to help with cash? That is not obvious. The filing is listed as being for an offering of $100M. In the context of other fundraising activities by Hadoop vendors – with Cloudera’s $900M or so of a few months ago at the top of the list – it will hardly create a war chest suitable for out-expanding its competitors aggressively. And there are cheaper and easier ways to raise $100M in Silicon Valley than an IPO.
In fact, a look at the numbers – now public for the first time because of the filing – makes it all the more puzzling. Hortonworks’ $33.4 million in revenue for the nine months ending Sept. 30 was up sharply from last year, only its second full year since HDP went GA in June 2012. Revenue for the last quarter was $12M. It was barely up over the prior quarter (also $12M), so things are actually a bit flat. But expenses are several times that – $29M and $41M respectively, so the gap is widening. Put another way, losses are growing faster than revenues are, at an accelerating rate. That $100M, plus the $111M the company has in cash now, gives it a year or two’s worth of runway to improve matters. Presumably, that’s the bet. But why only $100M if it seems possible that more could be available?
Is it for Hortonworks’ investors? Let’s see who they are. Here’s a table of their stakes (a table stakes stakes table, if you will):
Benchmark and Index are successful funds, and it’s unlikely they are in a hurry to cash in on their investment. Yahoo might care, but urgency seems unlikely, particularly after the Alibaba windfall. There is little reason to think HP is driving this. Teradata? OK, if they were betting on Hortonworks as the key element in their big data strategy, maybe they have decided to hedge, but it’s hard to imagine they really feel the need to worry about this – and they’ve already hedged by announcing a new partnership with Cloudera. They have a fair number of joint customers with MapR as well, so one can’t rule then out as a future partner too. Teradata’s role here is not likely to be the motivating factor.
Personal gain? Doubtful. The stakes owned by Hortonworks’ CEO and President are nice and will certainly help them – but there is no obvious reason for them to have accelerated this for their own gain, even if they could do so.
Is it to help build the market for Hadoop? This seems to have been the party line on general motivation till now. But they are one vendor among several, some truly megavendors and some similar in size – and evidently in prospects – for the near term. They are major contributors to the open source code in the Apache stack and driving substantial innovation. Being able to keep paying engineers (R&D is 28% of expenses, and has doubled over the past year) is a good use of funds – and $100M will fund a couple of years at the current run rate, which one might expect to level off a bit. But it won’t be the only use of funds: sales and marketing is 48%, and more is better. Still, let’s face it, because of Hortonworks’ business model, everything they build is Apache open source code. Their R&D spend enables their competitors too. It won’t separate them quickly and dramatically from the pack any better to have much more spending on either or both.
It’s been 10 years since the first Google paper on MapReduce. Hortonworks will be the first new public company descending from that and they want HDP as symbol. They were formed 3 years after Cloudera, so they can at least grab the Hadoop label for themselves. But with an open source stack, value is likely to be determined by how well the company is seen to run, how many customers it has, how likely the revenue of the company is to track growth in Hadoop usage at those and at new customer sites, etc. Hortonworks’ business is services and support. Nether is particularly high margin. Nor is it clear how customer spending on either or both will scale with their Hadoop usage.
Hortonworks’ 3 largest customers (Yahoo, Teradata, and Microsoft) account for 37.4% of its revenue – and two are investors. The biggest is Microsoft, at 22.4% now – it was 55.3% for the year ended April 30, 2013. That sort of concentration never makes investors too happy, and though it is declining it’s still sizable. The Microsoft deal, like all others, is renewable – it expires in July 2015. And like Teradata, Microsoft has added other partnerships to what was an exclusive with Hortonworks till recently. Is the possible “window” closing a reason to accelerate the IPO? According to Fortune magazine, to actually list in the 2014 calendar year, this was basically the last week for Hortonworks to make the S-1 public (due to a combination of holidays and regulatory waiting periods).
Ultimately, it’s unlikely that Hortonworks will be alone as a public company for long. MapR told the Wall Street Journal they want to IPO next year, and they claim to have more customers, high margins and “efficient cash management.” Cloudera says they “are not ready yet” though they have lower rate of losses, and also claim more customers. At the end of the day, the answer may be rather simple. And again, answering a question with a question: if not now, when? There may not be a better time.
Category: Apache Big Data Cloudera Gartner Hadoop Hortonworks HP Industry trends IPO MapR Microsoft Teradata Yahoo! Tags: Apache, big data, Cloudera, Gartner, Hadoop, Hortonworks, initial public offering, IPO, MapR, Microsoft, Teradata, Yahoo!
by Merv Adrian | October 31, 2014 | 10 Comments
New York’s Javits Center is a cavernous triumph of form over function. Giant empty spaces were everywhere at this year’s empty-though-sold-out Strata/Hadoop World, but the strangely-numbered, hard to find, typically inadequately-sized rooms were packed. Some redesign will be needed next year, because the event was huge in impact and demand will only grow. A few of those big tent pavilions you see at Oracle Open World or Dreamforce would drop into the giant halls without a trace – I’d expect to see some next year to make some usable space available.
So much happened, I’ll post a couple of pieces here. Last year’s news was all about promises: Hadoop 2.0 brought the promise of YARN enabling new kinds of processing, and there was promise in the multiple emerging SQL-on-HDFS plays. The Hadoop community was clearly ready to crown a new hype king for 2014.
This year, all that noise had jumped the Spark.
If you have not kept up, Apache Spark bids to
replace supplement MapReduce with a more general purpose engine, combining interactive processing and streaming along with MapReduce-like batch capabilities, leveraging YARN to enable a new, much broader set of use cases. (See Nick Heudecker’s blog for a recent assessment.) It has a commercializer in Databricks, which has shown great skill in assembling an ecosystem of support from a set of partners who are enabling it to work with multiple key Hadoop stack projects at an accelerating pace. That momentum was reflected in the rash of announcements at Hadoop World, across categories from Analytics to Wrangling (couldn’t come up with a Z.) There were more than I’ll list here – their vendors are welcome to add themselves via comments, and I’ll curate this post for a while to put them in.
Hadoop analytics pioneer Platfora announced its version 4.0 with enhanced visualizations, geo-analytics capabilities and collaboration features, and revealed it has “plans for integration” with Spark.
Tableau was a little more ready, delivering a beta version of its Spark Connector, claiming its in-memory offering delivered up to 100x the performance of Hadoop MapReduce. Tableau is also broadening its ecosystem reach, adding a beta version of its connector for Amazon EMR, and support for IBM BigSQL and MarkLogic.
Tresata extended the analytics wave to analytic applications, enhancing its customer intelligence management software for financial data by adding real-time execution of analytical processes using Spark. Tresata is an early mover, and believes one of its core advantages derives from having been architected to run entirely in Hadoop early on. It supports its own data wrangling with Automated Data Ontology Discovery and entity resolution – cleaning, de-duping, and parsing data.
(For developers, Tresata is also open sourcing Scalding-on-Spark – a library that adds support for Cascading Taps, Scalding Sources and the Scalding Fields API in Spark.)
Appliances were represented by Dell, who introduced a new In-memory box (one of many Hadoop appliances that represented another 2014 trend) that integrates Spark with Cloudera Enterprise. (Dell is all in on the new datastores – they have buit architectures with Datastax for Cassandra, and with MongoDB, as well.) And Cray, having completed its spinback of Yarc, unveiled its Urika-XA platform with Hadoop and Spark pre-installed, and leveraging its HPC expertise to exploit SSDs, parallel file systems, and high-speed interconnects for a test run to see if there is a high-end performance market yet.
Cloud was brought to the party by BlueData, packaging Spark with its EPIC™ private-cloud deployment platform. Standalone Spark clusters can run Spark-Scala, MLLib or SparkSQL jobs against data stored in HDFS, NFS and other storage. Note “standalone” – Spark can, and will, be used by shops that are not running Hadoop. Once it is actually running production jobs, that is.
Rackspace is in both games with its OnMetal – an appliance-based cloud you don’t have to own, with a high-performance design using 3.2 TB per data node. They provision the other services. Rackspace is partnering with Hortonworks to deliver HDP 2.1 or – you guessed it – Spark. This is all built on a thin virtualization layer on another emerging hot platform: Openstack.
The distributions were represented of course: Cloudera jumped in back in February accompanied by strong statements from Mike Olson that helped put it on the map. Hortonworks followed in May with a tech preview. It still is in preview – Hortonworks, for good reasons, is not quite prepared to call it production-ready yet. Pivotal support was announced in May – oddly, in the Databricks blog, reflecting its on-again, off-again marketing motions. In New York, MapR on the bandwagon since April as well, announced that Drill – itself barely out of the gate – will also run on Spark.
It was intriguing to note that many of the emerging data wrangling/munging/harmonizing/preparing/curating players started early. ClearStory CEO Sharmila Mulligan of was quick to note during her keynote appearance that her offering has been built on Spark from the outset. Paxata, another of the new players with a couple of dozen licensed customers already, has also built its in-memory, columnar, parallel enterprise platform on top of Apache Spark. It connects directly to HDFS, RDBMS, and web services like SalesForce.com and publishes to Apache Hive or Cloudera Impala. Trifacta, already onto its v2, has now officially named its language Wrangle , added native support for more complex data formats, including JSON, Avro, ORC and Parquet, and yes, is focusing on delivering scale for its data transformation through native use of both Spark and MapReduce.
Even the conference organizers got into the act. O’Reilly has made a big investment with Cloudera to make Strata a leading conference. It’s added a European conference, making Doug Cutting the new conference Chair. In New York, O’Reilly announced a partnership with Databricks for Spark developer certification, expanding the franchise before someone else jumps in.
There is far more to come from Spark – a memory-centric file system called Tachyon that will add new capabilities above today’s disk-oriented ones; the MLlib machine learning library that will leverage Spark’s superior iterative performance, GraphX for the long awaited graph performance that today is best served by commercial vendors like Teradata Aster, and of course, Spark Streaming. But much of that is simply not demonstrably production-ready just yet – much is still in beta. Or even alpha. We’ll be watching. For now, it’s the new hype king.
Category: Accumulo Amazon Apache Apache Yarn Aster Avro Big Data BigInsights Cascading Cassandra Cloudera Cray Elastic MapReduce Gartner Hadoop HDFS Hive Hortonworks IBM MapR MapReduce Microsoft Spark Uncategorized YARN Tags: Apache, Aster, Avro, big data, BigInsights, BigSQL, BlueData, Cassandra, CDH, Cloudera, Databricks, Datastax, EMR, Gartner, Hadoop, Hbase, HDFS, Hive, Hortonworks, IBM, Impala, JSON, MapR, MapReduce, MarkLogic, Microsoft, MLlib, MongoDB, Openstack, ORC, Parquet, Paxata, Platfora, Rackspace, Scalding, Spark, SQL, Tableau, Tachyon, Tresata, Trifacta, Yarn
by Merv Adrian | October 13, 2014 | 3 Comments
Hopefully, that title got your attention. A recursive acronym – the term first appeared in the book Gödel, Escher, Bach: An Eternal Golden Braid and is likely more familiar to tech folks who know Gnu – is self-referential (as in “Gnu’s not Unix.”) So how did I conclude Hadoop, whose name origin we know, fits the definition? Easy – like everyone else, I’m redefining Hadoop to suit my own purposes.
Let’s start with the obvious one. Of course, Doug Cutting named Hadoop after his child’s toy elephant, seen here.
Photo: Merv Adrian
And in its early days, as I discussed in my post about the changing composition of distributions a few months back, the story was simpler. Hadoop was HDFS, MapReduce and some utilities. As those utilities got formalized and became projects themselves and were supported by commercial distributors, the list grew: Pig, Hive, HBase, and Zookeeper were Hadoop too. And a few months ago, as I noticed, Accumulo, Avro, Cascading, Flume, Mahout, Oozie, Spark, Sqoop, and YARN had joined the list.
YARN is the one that really matters here because it doesn’t just mean the list of components will change, but because in its wake the list of components will change Hadoop’s meaning. YARN enables Hadoop to be more than a brute force, batch blunt instrument for analytics and ETL jobs. It can be an interactive analytic tool, an event processor, a transactional system, a governed, secure system for complex, mixed workloads. At Strata this week, we’ll talk about its integration with Red Hat’s middleware, its cautious alliance with Spark for MapReduce replacement, its alliance with data wrangling tools from startups and Teradata, its connection, via Sentry, to security stacks… and more.
So yes, many of us are redefining Hadoop as we add new pieces – new use cases, new projects that change its very nature. My answer to “What is Hadoop”?
OK – it’s a bit cute. But hopefully, it got your attention. Hadoop’s journey is just beginning, and there is much more change ahead.
Category: Accumulo Apache Apache Yarn Big Data Cascading Flume Gartner Hadoop Hbase HDFS Hive Mahout MapReduce Oozie Pig Spark Sqoop Teradata YARN Zookeeper Tags: Apache, Flume, Gartner, Hadoop, Hbase, HDFS, Hive, MapReduce, Oozie, Pig, Sqoop, Teradata, zookeeper
by Merv Adrian | October 10, 2014 | Comments Off
From my esteemed colleague Mark Beyer
“Unstructured data” is a misnomer—everyone finally agrees on that much, at least. It is a term that is often applied to information assets that are not relational. Sometimes it is applied to machine data generated by operational technologies. Sometimes it is applied to content, like documents, or even less specific text such as email or twitter feeds. The term unstructured creates fear, loathing and desire across the IT landscape. If it generates meaningful discussions about innovative processing or challenges the re-use of existing infrastructure without simply defaulting to a mindset of replacing components—then that is a good discussion. But that is about all the term is good for.
But I’ve been growing ever more fond of Semi-structured, but not for the reason that many who are enamored of the term might assume. Semi actually means half, but that doesn’t mean it is halfway between structured and unstructured. Because that would be halfway between myth and reality, which is a “place” that doesn’t actually exist. Consider the vernacular American use of the term “semi-truck”, that would mean “half truck”. But, of course, that isn’t what semi-truck means. A truck vehicle—referred to properly as the “tractor” — is “half” of the truck and the trailer is the other half of the truck. Separately, neither can move cargo. When you put them together, you have two halves that make a whole and creates a complete, useful delivery vehicle. So, a semi-truck is actually made up of a semi-tractor and a semi-trailer which makes, simply, a whole truck.
Now consider semi-structured data. What it actually means is that “half” of the governance and schema instructions (defined as physical and logical definition plus applied governance) are in the data and the other half are in the using application. Semi-structured data means that you MUST write an application to complete the schema. That also means that with each different application, you can impose different governance and schema instructions (also the processing rules), theoretically making the data more flexible.
However, with the introduction of this capability comes the danger of having disparate governance rules regarding the same data that may be so significant, it becomes different data. As data moves further away from containing its own schema, then flexibility increases and more and more of the entire schema must be imposed by the using application—and ever greater diversity is introduced. Of course application developers think this is a grand idea, but users of that data in a second environment are not equally fond of it.
In a data lake, we drop all of these various degrees of structure into a common pool or body. Each asset is deposited with a different level of dilution regarding the schema instructions, thus requiring a oceanographer’s level of expertise to determine when the data in a data lake has “high alkalinity” or “high acidity” or too much “copper” and so on. A data scientist who understands the “alchemy” that exists in the fetid waters of a polluted lake will have no trouble cleaning the data up and discerning pollutants from di-hydrogen oxide (that’s water). But a novice or dilettante may find that they are drinking polluted data water from the lake long after they are infected with the equivalent of data E.coli and are making continuous visits to the data latrine.
To be completely fair, the analogy of the waters of a lake when compared to the Data Lake, needs some departure from the metaphor, but there are also further extensions of the metaphor that also work (other than those obnoxious points I raise above). For example, a lake that is crystal clear somewhere in the mountains of upstate New York would be a dead lake. There would be no fish, no microscopic organisms, no snakes, no mosquitos flying over its surface. It would be crystal clear and completely dead. So, if you desire a completely clean lake, you will find a sanitized data environment that defeats the entire purpose of storing information in something near its native form. You want a vibrant, living data lake. But you must keep the trucking metaphor in mind and remember that unstructured data does not exist. Instead, semi-structured data which intentionally only has SOME of the schema embedded in the information asset, requires a knowledge of how to swim. And boats (tools for exploring the data lake) don’t really work either—anyone who cannot swim must be very careful to NOT fall out of the boat. But anyone who can swim, can safely use a boat and be confident that if they fall over the side they can make it back to shore.
Data lakes are for data scientists to conduct science, they are not for casual analytics users or even advanced business analysts who generally scuff their code-writing and script-writing toes on ignorance. But those data miners that understand business process and systems analysis, or those data scientists who understand model theory and statistical primitives? Well, they can swim all day. So, buy a boat (a tool), explore away, and be comfortable in the knowledge that some of the structure is waiting for you in the lake, but not all of it—your job is to figure out the rest including sometimes reaching over the side of the boat and getting a little wet because you never know when you will fall in.
Merv’s comment: this has been a topic of some discussion on our team. My own view is very close to Mark’s and is expressed in my closing slide from the Hadoop Summit
Category: Big Data data lake data warehouse Gartner Hadoop metadata Tags: data lake, data warehouse, Hadoop, metadata
by Merv Adrian | October 9, 2014 | 1 Comment
At Garner Symposium, Drue Reeves and I had the opportunity to interview Microsoft CEO Satya Nadella. Here’s a brief clip from the closing. I’m summarizing and Satya, passionate as he was throughout the conversation, lays out his vision about mobility that crosses the personal and professional: mobility of the individual and the app experiences. “Have my work and life wherever – that’s the true form of mobility.”
Category: Uncategorized Tags:
by Merv Adrian | October 2, 2014 | Comments Off
It’s rare that one gets the chance to talk to a new megavendor CEO in his first year on the job – especially in front of 10,000 senior IT professionals. But that is the opportunity Drue Reeves and I have on Tuesday, October 7 in Gartner’s Mastermind interview.
What have we got in mind? Enterprise IT questions. We won’t talk much about Xbox or Bing. But we do plan to ask Satya:
- How he is driving a culture of innovation, and in what direction
- What Windows 10 means to IT, and what they should do about it – and when
- How mobility is changing end user experiences – and what Microsoft is doing to get us there
- How the cloud impacts usage, data centers, and architects’ budgets
- What impact will the Internet of Things have on Microsoft – and us?
We’ll have a lightning round featuring questions from Twitter to liven things up – though I expect we won’t struggle with liveliness. If you want to suggest questions, send them to us with the twitter hashtag #GartnerSymAskSatya or email us at Gartner, or by comment here. We’ll do our best to keep up with them all….
Hope to see you there.
Category: Gartner Microsoft Tags: Gartner, Microsoft
by Merv Adrian | September 20, 2014 | 2 Comments
For the past few months, I’ve been Gartner’s Vendor Lead for Microsoft. For some 30 vendors, we assign a single analyst to act as a focal point for coordinating across the 1000 analysts we have when research covers that vendor.
In Microsoft’s case, that has proven to be fascinating – we have some 3 dozen Magic Quadrants alone that have been published about their offerings in the last 15 months or so. As Vendor Lead, I’m a mandatory peer reviewer for those and other documents. For my own edification, I decide to map the Magic Quadrants that feature Microsoft onto a quadrant that shows where Microsoft appears in that piece of research. The results are intriguing.
Microsoft has a sizable number of Leader offerings, but many in the Challenger quadrant, and a few that appear in the Niche Player quadrant as well. It’s a bit rarer for them to appear as Visionaries – if they have it figured out, their ability to execute tends to drive them up into the Leader space fairly quickly.
The chart shows places where Microsoft clearly needs to focus, and makes it clear that they play in numerous markets of interest to Gartner’s enterprise IT-focused audience. Many categories do not appear, and a half dozen MQs are currently in process. I’ll keep this up to date for myself, and occasionally will share it here.
Category: Gartner Microsoft Tags: Gartner, Microsoft
by Merv Adrian | July 24, 2014 | 5 Comments
Interest from the leading players continues to drive investment in the Hadoop marketplace. This week Teradata made two acquisitions – Revelytix and Hadapt – that enrich its already sophisticated big data portfolio, while HP made a $50M investment in, and joined the board of, Hortonworks. These moves continue the ongoing effort by leading players. 4 of the top 5 DBMS players (Oracle, Microsoft, IBM, SAP and Teradata) and 3 of the top 7 IT companies (Samsung, Apple, Foxconn, HP, IBM, Hitachi, Microsoft) have now made direct moves into the Hadoop space. Oracle’s recent Big Data Appliance and Big Data SQL, and Microsoft’s HDInsight represent substantial moves to target Hadoop opportunities, and these Teradata and HP moves mean they don’t want to be left behind.
Teradata begins its moves with Revelytix. Andrew White noted in Gartner’s 2013 Cool Vendors in Information Infrastructure and Big Data that Revelytix’ “Loom, which runs in Hadoop, classifies objects in the Hadoop Distributed File System and applies a predefined transformation so that objects become structured and more usable for data scientists.” In our discussions of the Logical Data Warehouse, Gartner has targeted the capabilities Revelytix was designed to provide as being on the critical path to creating a coherent, optimized metadata architecture that will incorporate both traditional Enterprse DWs and Hadoop – a direction or research shows the advanced users are heading in.
In the 2012 edition of Cool Vendors, I described Teradata’s other acquisition, Hadapt, defining its vision as a Postgres-based “RDBMS instance on every node in the cluster in order to improve performance of queries over the structured part of the data, and … data partitioning techniques to eliminate unnecessary data movement.” Admirable as it was, this vision had not generated much business, and the window for additional SQL-on-Hadoop offerings may be closing – but Teradata has acquired technology and engineering talent that it will put to use supplementing its continuing optimization of Teradata SQL and SQL-H across complex logical data fabrics. The Hadapt team joins Teradata, though the brand will disappear.
HP chose to make a direct investment in Hortonworks, which extended its last funding round, closed months ago, to accept an additional $50M. The oddity of these mechanics aside, HP gets significant impact for its money: Martin Fink, its CTO, joins the board. HP will integrate the Hortonworks Data Platform (HDP) into its HAVEn offering, invest resources to certify its Vertica column-store analytic DBMS with HDP, and provide 1st line support. Hortonworks gets access to the global HP channel which could provide a major boost to its sales capabilities. HP was already a reseller, but, HP has been partnering with MapR as well for some time, and this relationship does not end that one. HP gets access to a leader in the continuing development of Apache Hadoop, and it’s likely that the relationship will expand as the two decide what their roadmap will be.
Increasingly, the players are marshaling their forces for global competition, global sales and support, and increased integration with enterprise-class architectures. These moves will hardly close this round of the maneuvering – it will be interesting to see what comes next.
Category: Apache Big Data data warehouse DBMS Gartner Hadapt Hadoop Hortonworks HP IBM MapR Microsoft Oracle RDBMS Revelytix Teradata Uncategorized Tags: Apache, big data, CDH, Cloudera, data warehouse, Hadapt, Hadoop, Hortonworks, HP, IBM, MapR, Microsoft, Oracle, Revelytix, Teradata