Blog post

Big Content: The Unstructured Side of Big Data

By Darin Stewart | May 01, 2013 | 1 Comment

Searchopen dataEnterprise Content ManagmentBig Content

The age of information overload is slowly drawing to a close. The enterprise is finally getting comfortable with managing massive amounts of data, content and information. The pace of information creation continues to accelerate, but the ability of infrastructure and information management to keep pace is coming within sight. Big data is now considered a blessing rather than a curse. Even so, managing information is not the same as fully exploiting information. While ‘Big Data’ technologies and techniques are unlocking secrets previously hidden in enterprise data, the largest source of potential insight remains largely untapped. Unstructured content represents as much as eighty percent of an organizations total information assets. While Big Data technologies and techniques are well suited to exploring unstructured information, this ‘Big Content’ remains grossly underutilized and its potential largely unexplored.

Gartner defines unstructured data as content that does not conform to a specific, pre-defined data model. It tends to be the human-generated and people-oriented content that does not fit neatly into database tables. Within the enterprise unstructured content takes many forms, chief amongst which are business documents (reports, presentations, spreadsheets and the like), email and web content. Each of these content sources has mature disciplines supporting them. Business documents are shepherded through their lifecycle by ECM platforms. Email is managed, monitored and archived along with other text-based communication channels. Ever more sophisticated web content is matched by equally sophisticated Web Content Management tools. Each of these platforms is focused on management and retention rather than analysis and exploration. They are not intended to provide advanced analytical and exploration capabilities for the content they manage; nor are they capable of doing so. They can, however, provide a robust foundation supporting a Big Content infrastructure.

Enterprise owned and operated information is only part of the Big Content equation. The potential for insight and intelligence expands dramatically when enterprise information is augmented and enhanced with public information. Content from the social stream can be a direct line into the hearts and minds of customers. Blogs, tweets, comments and ratings are a reflection of the current state of public sentiment at any given point in time. More traditional web content such as news articles, product information and simple corporate informational web pages become an extension of internal research when tamed. More formal data sources are emerging in the public realm in the form of smart disclosure information from various areas of government in the US and Linked Open Data across the globe. All of these unstructured (and semi-structured) information sources become valuable extensions to enterprise information resources when approached in a Big Content manner.

Gartner is embarking on a new look at how Big Data technologies and techniques can be applied to unstructured information resources.  We are calling this Big Content.  I will be exploring this topic in a series of three documents that will appear over the course of the next few months.

  • Big Content: Unlocking the Unstructured Side of Big Data
  • Using Search to Discover Big Data
  • Building a Content Command Center

This an exciting and rapidly evolving part of the Big Data landscape.  Big Data, ECM, Search and Semantics are converging to open up new possibilities emerging from our ever growing content stores. I’m looking forward to examining and discussing this new topic with the Gartner community.

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Comments are closed

1 Comment

  • Jason Beasley-Hahn says:

    Hi Darin, good summation of the issue of big data and in particular the challenges that face the unstructured data world. The value of course comes from being able to see the augmented value of unstructured data when aligned with other unstructured data and connected to the structured data world.

    Just as a set of numbers has no value when not in context, so does say a clause in a purchaing contract if not taken or aligned with the remainder of the document or importantly the reason that the document exists in the first place.

    In the day to day operations of organisations, they rely on massive amounts of unstructured data to be wrapped around structured data to give it context and therefore a place from where actions and decsions can be made. We have made great strides forward in the ability to index this unstructured data and things like word clouds provide us visual clues to some aspects of the data — we need to be able to apply that type of technology across unrelated data to surface the intent and valid connections.

    Add a level of sensitivity analysis to the process and an explicit asset / sustainability value to a document or its content and we can better focus the decsion making process — but people will still make ‘interesting’ decsions regardless of the value of the information.