People often ask me if there is a magic quadrant for big data. There isn’t. What we have is a Hype Cycle for Big Data with abundance of big data technologies, some of which are just nascent, some are on the plateau of productivity, and some, like Hadoop distributions, are in the unrightfully dreaded and largely misunderstood trough of disillusionment.
Gartner also has annual cool vendor reports where analysts write about up and coming companies with innovative ideas, services and technologies. Many reports cover awesome big data vendors, for example, Cool Vendors in Big Data, Cool Vendors in Data Science and Cool Vendors in Information Innovation.
At Gartner for Technical Professionals (where I am), we usually publish vendor-neutral research and do not write for cool vendor reports (to be fair, we submit our choices and peer-review these reports). Yet, our clients constantly ask me and my colleagues about vendors. Last week, Fortune magazine published my opinion on big data companies to watch. My opinion was not about the best or the most prominent, most hyped or most intriguing, most funded or most profitable companies, but about the companies to watch.
Katherine Noyes, the author of the Fortune Magazine article, asked me to name five big data companies to watch and to comment on some published big data vendors lists. Below is my full response, it explains my choices:
Well, selecting just five companies is a challenge since there are many more companies that do interesting things around big data. I have technical and non-technical considerations for giving my list of five. My top noteworthy big data companies would be:
- Neo Technology is a force behind an open source graph database Neo4j – I think graphs have a great future since they show data in its connections rather than as a traditional view of atomic elements. Graph technologies are mostly unexplored by the enterprises but they are the solution that can deliver truly new insights from data. I wrote about graphs some time ago in my blog post Think Graph.
- Splunk has an excellent technology, and it was among the first big data companies to go public. Now, Splunk also has a strong product called Hunk (Splunk on Hadoop) directly delivering big data solutions that are more mature than most products in the market. Hunk is easy to use compared to many big data products, and generally, most customers I spoke with expressed their love to Splunk without any soliciting on my side.
- MemSQL – an in-memory relational database that would be effective for mixed workloads and for analytics. While SAP draws so much attention to in-memory databases by marketing their Hana database, MemSQL seems to be a less expensive and more agile solution in this space.
- Pivotal – while it might not be the most perfect big data solution, Pivotal is solving a much larger problem – the convergence of cloud, mobile, social and big data forces (which Gartner calls the Nexus of Forces). Eventually, big data is not a standalone technology but it should deliver actionable insights about the rapidly changing modern world with its social interactions, mobility, Internet of Things etc. That’s why GE is one of the major investors in Pivotal with the purpose of building the Industrial Internet.
- Teradata – it might be a surprising choice for many big data aficionados who chose Teradata as a target for religious wars of new big data technologies against the data warehouse, where Teradata is an easy prey because it’s a pure play in the data warehousing (as opposed to Oracle, IBM or Microsoft who have many more products). Meanwhile, Teradata delivers a unified data architecture that combines best of both worlds, and enterprises need both.
As you may have noticed, I am covering various segments of big data technologies. If I had more than five companies to choose from, I’d also add companies in other segments:
- Big data analytics: Actian and Datameer
- Predictive analytics: Revolution Analytics and Ayasdi
- Data integration: Pentaho and Denodo (particularly for data virtualization)
- Big data cloud providers: Qubole and Altiscale
- Hadoop: Cloudera – “first in space” for big data, huge recent investments from an interesting set of investors, most notably, Intel.
- Development framework: Concurrent with the open source product called Cascading, now included in some Hadoop distributions. Given that applications are about to explode on Hadoop, Concurrent should do very well.
Please note, this is not a comprehensive research and there are more very good companies. The companies I listed are “to watch” rather than best overall.
Now, to comment on the list of 100 big data companies you pointed me to. Out of this list, the following companies look appealing to me: Dataguise, MapR, MatterSight, Manhattan Software (an excellent player in real estate!) and Data Tamer (I would prefer Paxata though) – see my blog post Big Data Quantity vs. Quality.
I’d like to especially stop on The Hive. This VC company specializing in data has an unusual approach that I personally greatly appreciate. It conducts weekly live meetups, which cover diverse subjects and draw diverse people who are interested in big data. The Hive became one of the most well-known gatherings that attracts the brightest minds in big data as speakers (and as attendees). It became a social “big data hub” in Silicon Valley, and I believe, in India too. Being in the center of the big data life, the Hive has a great opportunity to make successful investments on early stages of data companies.
Finally, I’d like to remind you again: Gartner cool vendor reports are a much more comprehensive and pointed reading than my casual overview.
Follow Svetlana on Twitter @Sve_Sic
Category: "Data Scientist" Big Data big data market data paprazzi Gartner hype cycle Hadoop Information Everywhere innovation The Era of Data Trough of Disillusionment Uncategorized Tags: big data, cloudera, data paprazzi, data scientist, Information Everywhere, innovation, MapR, Silicon Valley, vendors