Blog post

Symposium Notes – Day Four Returns to Data Security, and to Hadoop

By Merv Adrian | October 26, 2016 | 0 Comments

SecurityGartnerdata lakeData IntegrationApache SparkApache HBaseApache HadoopData and Analytics StrategiesTechnology and Emerging Trends

Thursday, the final day, reinforced a theme for the week: data security is heating up, and organizations are not ready. It came up in half of today’s final 10 meetings.

“Is my data more secure, or less, in the cloud?”

“Does using open source software for data management compromise how well I can protect it?”

“I’m a public utility – can I put meter data in the cloud safely? What about if it is used to drive actions at the edge?”

“I’m using drones for mapping and the data is in the cloud – am I exposed?”

And the final conversation of the day revolved around competing roles – who has authority for securing this new data, and to whom do they report? And does “authority” equal “responsibility” and/or “accountability”? For organizations that have not really thought about this, the overlaps, conflicts, ambitions and procedures must be completely considered. Key principle for me: you can’t outsource responsibility.

Additional Hadoop questions came up as well.

  • An investor asked about “braintrust” departures from a leading Hadoop distribution vendor. This one was not so tough – it was only one or two people, and new opportunities and vesting schedules often have more to do with such changes than the status of the existing firm.
  • Several clients are wrestling with doing multiple things on the same on-premises infrastructure: “I want to use a dozen of the nodes for test and dev, some for Spark jobs and some for HBase work (which is variable and needs to scale up and down at different times.) It turns out that is really hard for us to manage.”
  • One client invoked organizational questions that blended with the technical: accommodating different groups’ requirements, and balancing resources across them, without building a cluster for each – “can’t I build a shared service that provides them all with what they need depending on their needs?” Not so easily, yet, is the answer.
  • An early adopter with a sizable cluster is trying to decide how to deal with scale that is literally straining physical capacity – it may be that alone that drives her to the cloud.
  • For another, ramping ingest (up, and then down, because it’s not continuous, from multiple sources at different unpredictable times) is proving maddeningly detailed and fragile.
  • An excellent economic question related to whether cloud-based scaling has value revolved around this opening: “I just paid my legacy DBMS vendor over $3M to do the exact same thing I’m already doing, but faster. If I’d done it in the cloud, it likely would have been less costly (if I could scale compute when I need to – but they can’t do that yet.) How long will I have to wait?”

These are the questions of organizations trying to get past those first pilots, and they help us, perhaps, to understand why production for big data has stalled according to Gartner research.

For me, this Symposium was the call to a significant research agenda in 2017. No event does that better, although our Data and Analytics Summits, coming in the spring, come very close.

Comments are closed