by Merv Adrian | September 20, 2014 | 2 Comments
For the past few months, I’ve been Gartner’s Vendor Lead for Microsoft. For some 30 vendors, we assign a single analyst to act as a focal point for coordinating across the 1000 analysts we have when research covers that vendor.
In Microsoft’s case, that has proven to be fascinating – we have some 3 dozen Magic Quadrants alone that have been published about their offerings in the last 15 months or so. As Vendor Lead, I’m a mandatory peer reviewer for those and other documents. For my own edification, I decide to map the Magic Quadrants that feature Microsoft onto a quadrant that shows where Microsoft appears in that piece of research. The results are intriguing.
Microsoft has a sizable number of Leader offerings, but many in the Challenger quadrant, and a few that appear in the Niche Player quadrant as well. It’s a bit rarer for them to appear as Visionaries – if they have it figured out, their ability to execute tends to drive them up into the Leader space fairly quickly.
The chart shows places where Microsoft clearly needs to focus, and makes it clear that they play in numerous markets of interest to Gartner’s enterprise IT-focused audience. Many categories do not appear, and a half dozen MQs are currently in process. I’ll keep this up to date for myself, and occasionally will share it here.
Category: Gartner Microsoft Tags: Gartner, Microsoft
by Merv Adrian | July 24, 2014 | 5 Comments
Interest from the leading players continues to drive investment in the Hadoop marketplace. This week Teradata made two acquisitions – Revelytix and Hadapt – that enrich its already sophisticated big data portfolio, while HP made a $50M investment in, and joined the board of, Hortonworks. These moves continue the ongoing effort by leading players. 4 of the top 5 DBMS players (Oracle, Microsoft, IBM, SAP and Teradata) and 3 of the top 7 IT companies (Samsung, Apple, Foxconn, HP, IBM, Hitachi, Microsoft) have now made direct moves into the Hadoop space. Oracle’s recent Big Data Appliance and Big Data SQL, and Microsoft’s HDInsight represent substantial moves to target Hadoop opportunities, and these Teradata and HP moves mean they don’t want to be left behind.
Teradata begins its moves with Revelytix. Andrew White noted in Gartner’s 2013 Cool Vendors in Information Infrastructure and Big Data that Revelytix’ “Loom, which runs in Hadoop, classifies objects in the Hadoop Distributed File System and applies a predefined transformation so that objects become structured and more usable for data scientists.” In our discussions of the Logical Data Warehouse, Gartner has targeted the capabilities Revelytix was designed to provide as being on the critical path to creating a coherent, optimized metadata architecture that will incorporate both traditional Enterprse DWs and Hadoop – a direction or research shows the advanced users are heading in.
In the 2012 edition of Cool Vendors, I described Teradata’s other acquisition, Hadapt, defining its vision as a Postgres-based “RDBMS instance on every node in the cluster in order to improve performance of queries over the structured part of the data, and … data partitioning techniques to eliminate unnecessary data movement.” Admirable as it was, this vision had not generated much business, and the window for additional SQL-on-Hadoop offerings may be closing – but Teradata has acquired technology and engineering talent that it will put to use supplementing its continuing optimization of Teradata SQL and SQL-H across complex logical data fabrics. The Hadapt team joins Teradata, though the brand will disappear.
HP chose to make a direct investment in Hortonworks, which extended its last funding round, closed months ago, to accept an additional $50M. The oddity of these mechanics aside, HP gets significant impact for its money: Martin Fink, its CTO, joins the board. HP will integrate the Hortonworks Data Platform (HDP) into its HAVEn offering, invest resources to certify its Vertica column-store analytic DBMS with HDP, and provide 1st line support. Hortonworks gets access to the global HP channel which could provide a major boost to its sales capabilities. HP was already a reseller, but, HP has been partnering with MapR as well for some time, and this relationship does not end that one. HP gets access to a leader in the continuing development of Apache Hadoop, and it’s likely that the relationship will expand as the two decide what their roadmap will be.
Increasingly, the players are marshaling their forces for global competition, global sales and support, and increased integration with enterprise-class architectures. These moves will hardly close this round of the maneuvering – it will be interesting to see what comes next.
Category: Apache Big Data data warehouse DBMS Gartner Hadapt Hadoop Hortonworks HP IBM MapR Microsoft Oracle RDBMS Revelytix Teradata Uncategorized Tags: Apache, big data, CDH, Cloudera, data warehouse, Hadapt, Hadoop, Hortonworks, HP, IBM, MapR, Microsoft, Oracle, Revelytix, Teradata
by Merv Adrian | July 16, 2014 | 4 Comments
One of the more interesting conversations I had at the Microsoft Worldwide Partners Conference this week concerned an initiative they have launched to help IT understand – and get under control – proliferating ungoverned SaaS applications. Brad Anderson, Corporate VP for Cloud and Mobility, told the 16,000 attendees that enterprises need help. “We ask them how many SaaS apps they have in their environment and they usually tell us 30-40. We audit with the Cloud App Discovery tool and find , on average, over 300.” And are these managed? One can only imagine…
The tool is in preview now, and a link to try it out for free is provided in Microsoft’s blog, It offers more than discovery – it will permit managers to monitor usage, identify users, integrate apps into Azure Active Directory, and more.
This is part of a larger story about governance and optimization in a hybrid cloud- and on-premises world that enterprises will live in for this decade and the next. Anderson also pointed out that 3.1M smartphones were stolen and another 1.4M lost. How many of these had corporate data on them. Would you know if it happened to one of your users? Can you govern access to corporate data in the apps there, and prevent it from being pasted into emails by someone who gets that phone and uses the saved logins to get at it? Some of these challenges can be handled by policy-based tools.
Getting the apps your users want into Azure, managing them there, and linking the on-premises Active Directory used by the overwhelming majority of enterprises to Azure Active Directory offers the possibility of getting corporate data security under better control before you find out how you look in orange. One of my favorite scenarios Microsoft showed its Enterprise Mobility Suite detecting is “impossible logins” – an hour ago you logged in from Australia and now you’re apparently in Chicago. Software can stop that? Yes.
The context here was Microsoft telling its partners about the opportunities for them to sell these capabilities to customers – and it’s hard to imagine them not wanting to, especially with the incentives, certification, training and co-marketing efforts Microsoft is launching. Expect this to be a major theme, leveraging the power of the crown jewel that Active Directory is in the portfolio in many additional ways to come.
Category: Active Directory Industry trends Microsoft mobility SaaS Security Tags: Microsoft, mobility, SaaS, Security
by Merv Adrian | June 28, 2014 | 4 Comments
In February 2012, Gartner published How to Choose The Right Apache Hadoop Distribution (available to clients). At the time, the leading distributors were Cloudera, EMC (now Pivotal), Hortonworks (pre-GA), IBM, and MapR. These players all supported six Apache projects: HDFS, MapReduce, Pig, Hive, HBase, and Zookeeper. Things have changed.
[updated June 29] We included Datastax (a distributor of Apache Cassandra) then, but they did not, and still don’t, consider themselves part of the Hadoop ecosystem. And they are not alone in having a reductive view of the answer to the question What Is Hadoop? Doug Cutting, pioneer in creating it and Chief Architect at Cloudera and former president of the Apache Software Foundation, considers the Hadoop Project to be HDFS, MapReduce and some common utilities. He made that point clear during a panel of luminaries my colleague Nick Heudecker conducted recently – the video is linked to Nick’s blog here. Everything else is “related projects.” Arun Murthy of Hortonworks, who has driven the creation of YARN, prefers to say that HDFS and YARN are “kernel” now, likening the description to the way most of us think of Linux. The Apache page continues to use the older description, including HDFS, MapReduce and YARN. (June 29, 2014)
To users, and especially buyers, the definition is more expansive. Hadoop is what they use to compose a useful stack of software to execute a business process of some sort. And distributors agree: in a little over two years, the set of projects included in all commercial distributions has now reached fifteen – two and a half times as many in just over two years. The list now includes Accumulo, Avro, Cascading, Flume, Mahout, Oozie, Spark, Sqoop, and YARN.
Others are likely to join this stack long before the next two years are up: the candidates include Falcon, Knox, Giraph, Hue, Lucene, Storm, Tez, and others. Hadoop has moved from a coarse-grained blunt instrument for largely ETL-style workloads to an expanding stack for virtually any IT task big data professionals will want to undertake. What is Hadoop now? It’s a candidate to be the alternative universe for data processing, with over 20 components that span a wide array of functions. More money continues to flow into the ecosystems, more companies form, more programmers take up the challenges, and the big players are scrambling to get aboard the train.
What is Hadoop?
It’s what’s next.
Category: Accumulo Apache Apache Yarn Avro Cascading Cloudera Falcon Flume Gartner Giraph Hadoop Hbase HDFS Hive Hortonworks Hue IBM Knox Lucene Mahout MapR MapReduce Oozie Pig Pivotal Spark Sqoop Storm Tez YARN Zookeeper Tags:
by Merv Adrian | March 24, 2014 | 11 Comments
This post was jointly authored by Merv Adrian (@merv) and Nick Heudecker (@nheudecker) and appears on both blogs.
In the early days of Hadoop (versions up through 1.x), the project consisted of two primary components: HDFS and MapReduce. One thing to store the data in an append-only file model, distributed across an arbitrarily large number of inexpensive nodes with disk and processing power; another to process it, in batch, with a relatively small number of available function calls. And some other stuff called Commons to handle bits of the plumbing. But early adopters demanded more functionality, so the Hadoop footprint grew. The result was an identity crisis that grows progressively more challenging for decisionmakers with almost every new announcement.
This expanding footprint included a sizable group of “related projects,” mostly under the Apache Software Foundation. When Gartner published How to Choose the Right Apache Hadoop Distribution in early February 2012 the leading vendors we surveyed (Cloudera, MapR, IBM, Hortonworks, and EMC) all included Pig, Hive, HBase, and Zookeeper. Most were willing to support Flume, Mahout, Oozie, and Sqoop. Several other projects were supported by some, but not all. If you were asked at the time, “What is Hadoop?” this set of ten projects, the commercially supported ones, would have made a good response.
In 2013, Hadoop 2.0 arrived, and with it a radical redefinition. YARN muddied the clear definition of Hadoop by introducing a way for multiple applications to use the cluster resources. You have options. Instead of just MapReduce, you can run Storm (or S4 or Spark Streaming), Giraph, or HBase, among others. The list of projects with abstract names goes on. At least fewer of them are animals now.
During the intervening time, vendors have selected different projects and versions to package and support. To a greater or lesser degree all of these vendors call their products Hadoop – some are clearly attempting to move “beyond” that message. Some vendors are trying to break free from the Hadoop baggage by introducing new, but just as awful, names. We have data lakes, hubs, and no doubt more to come.
But you get the point. The vague names indicate the vendors don’t know what to call these things either. If they don’t know what they’re selling, do you know what you’re buying? If the popular definition of Hadoop has shifted from a small conglomeration of components to a larger, increasingly vendor-specific conglomeration, does the name “Hadoop” really mean anything anymore?
Today the list of projects supported by leading vendors (now Cloudera, Hortonworks, MapR, Pivotal and IBM) numbers 13. Today it’s HDFS, YARN, MapReduce, Pig, Hive, HBase, and Zookeeper, Flume, Mahout, Oozie, Sqoop – and Cascading and HCatalog. Coming up fast are Spark, Storm, Accumulo, Sentry, Falcon, Knox, Whirr… and maybe Lucene and Solr. Numerous others are only supported by their distributor and are likely to remain so, though perhaps MapR’s support for Cloudera Impala will not be the last time we see an Apache-licensed, but not Apache project, break the pattern. All distributions have their own unique value-add. The answer to the question, “What is Hadoop?” and the choice buyers must make will not get easier in the year ahead – it will only become more difficult.
Category: Accumulo Ambari Apache Apache Drill Apache Yarn Big Data BigInsights Cloudera Elastic MapReduce Gartner Giraph Hadoop Hbase HCatalog HDFS Hive Hortonworks IBM Intel Lucene MapR MapReduce Oozie open source OSS Pig Solr Sqoop Storm YARN Zookeeper Tags: Apache, big data, BigInsights, CDH, Cloudera, Datastax, EMC, Flume, Hadoop, Hbase, HDFS, Hive, Hortonworks, IBM, InfoSphere, Isilon, MapR, MapReduce, Oozie, open source, OSS, Pig, Pivotal, Sqoop, zookeeper
by Merv Adrian | February 23, 2014 | 5 Comments
In my post about the BYOH market last October, I noted that increasing numbers of existing players are connecting their offerings to Apache Hadoop, even as upstarts enter their markets with a singular focus. And last month, I pointed out that Nick Heudecker and I detected a surprising lack of concern about security in a recent Hadoop webinar. Clearly, these two topics have an important intersection – both Hadoop specialists (including distribution vendors) and existing security vendors will need to expand their efforts to drive awareness if they are to capture an opportunity that is clearly going begging today. Security for big data will be a key issue in 2014 and beyond.
Other analysts at Gartner have tracked many of these products, and in my own followup I’ve been catching up on the work of Joseph Feiman and Brian Lowans, among others. Their Magic Quadrant for Data Masking, published in December, offers useful discussion of that capability (both static and dynamic) and which existing players have already added Hadoop support. Axis Technology’s DMSuite, Dataguise (who partners with Compuware), IBM InfoSphere Optim Data Privacy and InfoSphere Guardium Data Activity Monitor, Informatica Dynamic Data Masking and Persistent Data Masking, and Voltage SecureData Enterprise are all mentioned in the MQ.
There are other offerings, of course – for example Feiman and Lowans note that masking of big data is available for the Oracle Big Data Appliance with its installed Cloudera distribution, but added that it requires the use of Oracle consulting services, or the services of Oracle’s numerous service partners. Similarly, there are several emerging Hadoop focused firms I’ve mentioned elsewhere and will cover in an upcoming piece of Gartner research I’m doing with Neil MacDonald. With RSA coming up this week (unfortunately, I can’t attend), I expect to see more heat – and perhaps light as well – on the issue ahead.
Category: Apache Big Data Cloudera Dataguise Gartner Hadoop IBM Magic Quadrant Oracle Security Tags: Apache, Axis Technology, big data, Cloudera, Compuware, Dataguise, Hadoop, IBM, Informatica, open source, Oracle, OSS, Security, Voltage
by Merv Adrian | February 19, 2014 | 3 Comments
In the most profound change of leadership in Microsoft’s history, Satya Nadella, who was head of the Cloud and Enterprise division, has taken the helm, succeeding Steve Ballmer. Nadella’s “insider” understanding of Microsoft’s culture and his effectiveness in cross-team communication and collaboration could help him reshape Microsoft for the digital era — which will be key for the company to attain the visionary technical leadership to which it aspires.
Nadella’s main challenge consists of evolving Microsoft’s existing businesses (including its enterprise offerings, which represent half of its current revenue) while reinventing Microsoft to make it relevant in mobile and cloud-centric markets.
Unlike his sales-focused predecessor, Steve Ballmer, Nadella has an engineering background; thus, his selection has reinstated the model of a technically minded CEO driving the company’s technical vision. But Nadella also must overcome several challenges.
He lacks direct experience in the mobile market. His insider status raises the risk of his being overly respectful of existing businesses, and hanging back from tough decisions that potentially threaten them but are critical to generating innovation. He will also need to shake up what is widely viewed as a culturally dysfunctional management structure.
Nadella must quickly demonstrate that he is not backing a “business as usual” strategy, and that he recognizes that design is front and center in client computing for both consumers and enterprise users and that a mobilized environment has replaced the desktop. The next six months will show how well Nadella and Gates collaborate to determine Microsoft’s technical direction.
We expect Microsoft’s trajectory will be clear by year-end 2014. Do not expect radical changes in the company’s overall “devices and services” strategy; instead watch for organizational shifts, product design changes and updated product road maps to address a mobile- and cloud-dominant world.
- Establish a vision of itself as an innovative, disruptive force in IT. Concentrating on mobile technology and leveraging lessons learned from gaming can help Microsoft appeal to the next generation.
- Emphasize design to enhance ease of use for consumers, and apply these lessons to its considerable assets in IT infrastructure to change its image of a legacy enterprise vendor competing in a consumerized market.
- Enable entrepreneurs and developers to develop new business value atop a common Windows client environment with unified, cross-platform services. Microsoft must enable a complete, compelling set of apps that attracts developers and can compete with and within iOS and Android environments.
- Acknowledge its customers’ heterogeneity by supporting Google and Apple client environments, the Linux/Java environment on servers, and cloud-based services in general.
- Deliver compelling experiences and solutions to both IT and to non-IT buyers.
For clients, my research note “Nadella Must Disrupt Microsoft Models to Establish Market Leadership,” co-authored with David Cearley, can be found on the Gartner website at http://www.gartner.com/doc/2664432.
Category: Gartner Microsoft Tags: Microsoft
by Merv Adrian | January 21, 2014 | 19 Comments
“Not looking” at security and privacy seems to be the posture of people implementing Hadoop, based on recent data Gartner has collected. This is troubling, and paradoxical. In an era when the privacy of data, from government surveillance to medical record-keeping to “creepy” marketing initiatives and password breaches, has been in the news regularly, it is hard to understand why professionals implementing Hadoop are not paying attention.
The data here comes from a recent webinar I conducted with my colleague Nick Heudecker. We had over 600 attendees, and during the discussion we offered several polling questions. One had to do with barriers to Hadoop adoption. We had 213 responses to that question.
You can see the results below and two things leap out: only 2% of the respondents see lack of robust security as a barrier, and half of the respondents feel that they do not have a sufficiently defined value proposition. More on the latter in another post.
For me, the nearly non-existent response to the security issue is shocking. Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there numerous concerns. These include the use of external unveiled data and of data in file systems that lack any protection, and the separation of Hadoop initiatives in most organizations from IT governance. Add to that the kinds of use cases Hadoop is being pointed at: sensitive health care information personal data in retail systems; telephone usage; social media connection and sentiment analytics – all of them give us pause.
I’ve pointed to security as a key issue facing the Hadoop community in 2014 for some time now. The fact that awareness of the problem is not getting attention only reinforces my belief that we will see major problems as Hadoop goes mainstream.
Category: Big Data Gartner Hadoop Security Tags: big data, data security, Hadoop, Security
by Merv Adrian | January 17, 2014 | 3 Comments
In the Hadoop community there is a great deal of talk of late about its positioning as an Enterprise Data Hub. My description of this is “aspirational marketing;” it addresses the ambition its advocates have for how Hadoop will be used, when it realizes the vision of capabilities currently in early development. There’s nothing wrong with this, but it does need to be kept in perspective. It’s a long way off.
Start here: A Gartner Research Circle survey revealed that in 2013, big data projects went into production in less than 8% of enterprises – and Hadoop was by no means the only technology included in their count. In 2014, many more enterprises will go into production with their first Hadoop project or two, perhaps even pushing deployed Hadoop shops into low double digit percentages.
In those same shops, there are thousands of significant database instances, and tens of thousands of applications – and those are conservative numbers. So the first few Hadoop applications will represent a toehold in their information infrastructure. It will be a significant beachhead, and it will grow as long as the community of vendors and open source committers deliver on the exciting promise of added functionality we see described in the budding Hadoop 2.0 era, adding to its early successes in some analytics and data integration workloads.
So “Enterprise Data Hub?” Not yet. At best in 2014, Hadoop will begin to build a role as part of an Enterprise Data Spoke in some shops. Aspirations are good, but perspective helps. Don’t confuse vision with strategy. The enterprise data warehouse has a long life ahead of it; the synergistic addition of Hadoop to its evolution into the logical data warehouse is just beginning, and Hadoop’s role in operational workloads and event processing has yet to launch.
Category: Apache Big Data data warehouse DBMS Gartner Hadoop RDBMS Tags: Apache, big data, data warehouse, Hadoop
by Merv Adrian | January 13, 2014 | 11 Comments
Talk to security folks, especially network ones, and AAA will likely come up. It stands for authentication, authorization and accounting (sometimes audit). There are even protocols such as Radius (Remote Authentication Dial In User Service, much evolved from its first uses) and Diameter, its significantly expanded (and punnily named) newer cousin, implemented in commercial and open source versions, included in hardware for networks and storage. AAA is and will remain a key foundation of security in the big data era, but as a longtime information management person, I believe it’s time to acknowledge that it’s not enough, and we need a new A – anonymization.
I realize I’m speaking out of turn here. I’m not a security guy myself, and I don’t pretend to be deep in the disciplines that decide whether you are who claim to be (authentication) and govern whether you can get to the network. Nor do I know the detailed nuances, spread across many different resources, that grant me permission to do what I will be allowed to do with those resources when I get there (authorization.) I don’t understand the various protections that assure breaches do not/have not occurred, which depend on the audit capabilities (the latter, as accounting, also provides the mechanism to report on all of the above.
What I do spend some time on is what happens within the resource that holds the data, when an authenticated, authorized person who is appropriately audited gets to it. For example, we need to distinguish what DBAs can see from what an analyst can – financial types call that “separation of concerns,” and it’s typically managed by a DBMS, which has mechanisms to interact with authorization capabilities to implement policy. It can be coarse- or quite fine-grained, and it’s one of the reasons we analysts always like to remind people that we talk about database management systems, not just databases.
But here’s the problem: in the big data era, much of the data we work with is not in DBMSs – and more and more of it will not be, as file-based systems like Hadoop gain broader and broader use. File systems don’t provide that granular control, so intervening layers will be required. They too can be coarse – we can encrypt/decrypt everything, for example. Or they too can be fine-grained, offering selective, policy-based decryption – in memory, after the bits come off the disk, before handing to the requester.
Personally, I hope people who model disease vectors, or even purchase behaviors, can build effective predictive models that describe what happens to people with certain characteristics. I just don’t want that process to result in my name being “on their list.” If they can intuit and classify what I am by my behavior and assign me to a category in some separate process, that is a different issue.
One approach that matters a great deal is obfuscation, which replaces a field like name or SSN with valid characters, but not the original data. Its value is that if properly implemented, it maintains mathematical cohesion and permits statistical analysis, aggregation, model building, etc to proceed without individuation of the records they are performed over. This is a privacy concern. Redaction – the familiar “blacking out” of content, is also used – but in some policy scenarios, being able to peer into the redacted data might subsequently be of value, and redaction typically doesn’t permit this.
Both approaches, however, can be classified as anonymization (or data deidentification, but I prefer to add another A to AAA for consistency!), and in an era where big data will increasingly be used to track human behaviors for medical, commercial and security reasons, I believe it’s time for anonymization to join the other 3 As. Perhaps it’s time to talk of authentication, authorization, anonymization and audit as the true foundation for data security.
Thanks to my esteemed Gartner colleague Neil MacDonald for commenting on and improving this post.
Category: Big Data data integration data warehouse DBMS Gartner Hadoop HDFS Industry trends Security Tags: big data, data integration, data security, data warehouse, Hadoop, HDFS, Security