Gartner Blog Network

Big Content: The Unstructured Side of Big Data

by Darin Stewart  |  May 1, 2013  |  4 Comments

The age of information overload is slowly drawing to a close. The enterprise is finally getting comfortable with managing massive amounts of data, content and information. The pace of information creation continues to accelerate, but the ability of infrastructure and information management to keep pace is coming within sight. Big data is now considered a blessing rather than a curse. Even so, managing information is not the same as fully exploiting information. While ‘Big Data’ technologies and techniques are unlocking secrets previously hidden in enterprise data, the largest source of potential insight remains largely untapped. Unstructured content represents as much as eighty percent of an organizations total information assets. While Big Data technologies and techniques are well suited to exploring unstructured information, this ‘Big Content’ remains grossly underutilized and its potential largely unexplored.

Gartner defines unstructured data as content that does not conform to a specific, pre-defined data model. It tends to be the human-generated and people-oriented content that does not fit neatly into database tables. Within the enterprise unstructured content takes many forms, chief amongst which are business documents (reports, presentations, spreadsheets and the like), email and web content. Each of these content sources has mature disciplines supporting them. Business documents are shepherded through their lifecycle by ECM platforms. Email is managed, monitored and archived along with other text-based communication channels. Ever more sophisticated web content is matched by equally sophisticated Web Content Management tools. Each of these platforms is focused on management and retention rather than analysis and exploration. They are not intended to provide advanced analytical and exploration capabilities for the content they manage; nor are they capable of doing so. They can, however, provide a robust foundation supporting a Big Content infrastructure.

Enterprise owned and operated information is only part of the Big Content equation. The potential for insight and intelligence expands dramatically when enterprise information is augmented and enhanced with public information. Content from the social stream can be a direct line into the hearts and minds of customers. Blogs, tweets, comments and ratings are a reflection of the current state of public sentiment at any given point in time. More traditional web content such as news articles, product information and simple corporate informational web pages become an extension of internal research when tamed. More formal data sources are emerging in the public realm in the form of smart disclosure information from various areas of government in the US and Linked Open Data across the globe. All of these unstructured (and semi-structured) information sources become valuable extensions to enterprise information resources when approached in a Big Content manner.

Gartner is embarking on a new look at how Big Data technologies and techniques can be applied to unstructured information resources.  We are calling this Big Content.  I will be exploring this topic in a series of three documents that will appear over the course of the next few months.

  • Big Content: Unlocking the Unstructured Side of Big Data
  • Using Search to Discover Big Data
  • Building a Content Command Center

This an exciting and rapidly evolving part of the Big Data landscape.  Big Data, ECM, Search and Semantics are converging to open up new possibilities emerging from our ever growing content stores. I’m looking forward to examining and discussing this new topic with the Gartner community.

Additional Resources

View Free, Relevant Gartner Research

Gartner's research helps you cut through the complexity and deliver the knowledge you need to make the right decisions quickly, and with confidence.

Read Free Gartner Research

Category: big-content  enterprise-content-managment  open-data  search  

Tags: big-content  big-data  ecm  search  

Darin Stewart
Research Vice President
6 years with Gartner
21 years IT industry

Darin Stewart is a research vice president for Gartner in the Collaboration and Content Strategies service. He covers search, knowledge management, semantic technologies and enterprise content management. Read Full Bio

Thoughts on Big Content: The Unstructured Side of Big Data

  1. Jason Beasley-Hahn says:

    Hi Darin, good summation of the issue of big data and in particular the challenges that face the unstructured data world. The value of course comes from being able to see the augmented value of unstructured data when aligned with other unstructured data and connected to the structured data world.

    Just as a set of numbers has no value when not in context, so does say a clause in a purchaing contract if not taken or aligned with the remainder of the document or importantly the reason that the document exists in the first place.

    In the day to day operations of organisations, they rely on massive amounts of unstructured data to be wrapped around structured data to give it context and therefore a place from where actions and decsions can be made. We have made great strides forward in the ability to index this unstructured data and things like word clouds provide us visual clues to some aspects of the data — we need to be able to apply that type of technology across unrelated data to surface the intent and valid connections.

    Add a level of sensitivity analysis to the process and an explicit asset / sustainability value to a document or its content and we can better focus the decsion making process — but people will still make ‘interesting’ decsions regardless of the value of the information.

  2. […] Big Content: The Unstructured Side of Big Data ( […]

  3. […] true in the case of incorporating unstructured content into the Big Data world (or as I called it in a previous post Big Content). Search is one area of technology where the order of precedence between structured and […]

  4. […] true in the case of incorporating unstructured content into the Big Data world (or as I called it in a previous post Big Content). Search is one area of technology where the order of precedence between structured and […]

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.