This is a joint post authored with Merv Adrian
There were many questions asked after the last quarterly Hadoop webinar, and Nick and I have picked a few that were asked several times to respond to here. Just to be clear, we don’t generally give deep advice in blog posts – our blogs are descriptive, not prescriptive. Gartner clients submit inquiries, and justifiably think that since they are paying for the privilege, we should reserve that for them.
1. What was that document you referred to that listed a lof of Hadoop use cases?
Toolkit: Big Data Business Opportunities From Over 100 Use Cases, available to Gartner clients by following the link.
2. Is Apache Spark replacing Hadoop or complementing existing Hadoop practice?
Both are already happening. With uncertainty about “what is Hadoop” there is no reason to think solution stacks built on Spark, not positioned as Hadoop, will not continue to proliferate as the technology matures. At the same time, Hadoop distributions are all embracing Spark and including it in their offerings
3. Does Hadoop only work on structured data, where the data schema is strictly defined in detail? Or is Hadoop used widely on unstructured data today?
Far more the latter in its earliest uses – clickstreams, log files, machine readings, text data that is in the “dark data” archives have been among the earliest poster children. Parsing and extracting data from such sources was at the root of the earliest creators’ objectives. But it doesn’t need to stop there, and it hasn’t.
4. “Don’t Microsoft’s products like PowerPivot have a long term advantage over those of Apache?”
That’s a false dichotomy. Microsoft is an enthusiastic participant with its HDInsight offering and Hortonworks and Cloudera partnership – its tools are already designed to work with data in Hadoop (in the cloud or on-premises, or – and this is exciting – both.) Complementary is the word here. More on cloud below.
5. How much Hadoop do we see in the cloud?
A great deal, and we hope to have better metrics soon. As the webinar mentioned, Amazon got there first with a commercial offering of MapReduce, and it has hosted millions of clusters since then, and is likely to expand its partnerships with distribution vendors beyond the already promising one with MapR.
6. How is NoSQL related to Hadoop?
This question comes up frequently. The easiest classification is to think of NoSQL as OLTP-like and Hadoop as OLAP-like. Neither classification is entirely fair to either category, but it can be a helpful way to get started. The two technology categories can be integrated, just as relational DBMSs and Hadoop can be integrated. A number of vendors have announced partnerships along these lines.
7. What is the best way to get some hands on experience with Hadoop?
First figure out what you want to do – operate a cluster or use a cluster. Then start. If you want to use a cluster, nearly every vendor, if not every vendor, offers something like a preconfigured virtual machine instance. You can simply download and use it. Many of these include tutorials on data loading and processing. If you don’t have data, check out data.gov, data.worldbank.org, or here’s a list of thousands of public data sources. If you want to operate a cluster, download the free version from any vendor and get it working. You can also use cloud options to get started if you want to spin up a cluster and experiment. Additionally, several vendors offer training and certification programs, some of which are free.
8. Do you have any word on Apache Flink?
Nick is planning on a longer blog post on Flink, but so far none of our clients have asked and we’re not aware of any Hadoop vendors supporting it. Flink appears to address many of the same use cases as Spark. As we understand it, and we don’t claim our understanding is correct, Flink takes some inspiration from RDBMSs in how it handles data, allowing for more control over memory use and additional optimizations to iterative processing.
9. There are a lot of Hadoop products coming up/developed everyday. Is there a common site which talks about the latest developement giving accurate information?
We’re not aware of a common site, but we’ve found Hadoop Weekly very helpful at catching things we may have missed.
To watch a replay of the webinar or share with your colleagues, click here: Where Hadoop Has Been and Where It’s Going in 2015.
Predicts 2019: Data and Analytics Strategy
Data and analytics are the key accelerants of digitalization, transformation and “ContinuousNext” efforts. As a result, data and analytics leaders will be counted upon to affect corporate strategy and value, change management, business ethics, and execution performance.Read Free Gartner Research
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.