The age of information overload is slowly drawing to a close. The enterprise is finally getting comfortable with managing massive amounts of data, content and information. The pace of information creation continues to accelerate, but the ability of infrastructure and information management to keep pace is coming within sight. Big data is now considered a blessing rather than a curse. Even so, managing information is not the same as fully exploiting information. While ‘Big Data’ technologies and techniques are unlocking secrets previously hidden in enterprise data, the largest source of potential insight remains largely untapped. Unstructured content represents as much as eighty percent of an organizations total information assets. While Big Data technologies and techniques are well suited to exploring unstructured information, this ‘Big Content’ remains grossly underutilized and its potential largely unexplored.
Gartner defines unstructured data as content that does not conform to a specific, pre-defined data model. It tends to be the human-generated and people-oriented content that does not fit neatly into database tables. Within the enterprise unstructured content takes many forms, chief amongst which are business documents (reports, presentations, spreadsheets and the like), email and web content. Each of these content sources has mature disciplines supporting them. Business documents are shepherded through their lifecycle by ECM platforms. Email is managed, monitored and archived along with other text-based communication channels. Ever more sophisticated web content is matched by equally sophisticated Web Content Management tools. Each of these platforms is focused on management and retention rather than analysis and exploration. They are not intended to provide advanced analytical and exploration capabilities for the content they manage; nor are they capable of doing so. They can, however, provide a robust foundation supporting a Big Content infrastructure.
Enterprise owned and operated information is only part of the Big Content equation. The potential for insight and intelligence expands dramatically when enterprise information is augmented and enhanced with public information. Content from the social stream can be a direct line into the hearts and minds of customers. Blogs, tweets, comments and ratings are a reflection of the current state of public sentiment at any given point in time. More traditional web content such as news articles, product information and simple corporate informational web pages become an extension of internal research when tamed. More formal data sources are emerging in the public realm in the form of smart disclosure information from various areas of government in the US and Linked Open Data across the globe. All of these unstructured (and semi-structured) information sources become valuable extensions to enterprise information resources when approached in a Big Content manner.
Gartner is embarking on a new look at how Big Data technologies and techniques can be applied to unstructured information resources. We are calling this Big Content. I will be exploring this topic in a series of three documents that will appear over the course of the next few months.
Big Content: Unlocking the Unstructured Side of Big Data
Using Search to Discover Big Data
Building a Content Command Center
This an exciting and rapidly evolving part of the Big Data landscape. Big Data, ECM, Search and Semantics are converging to open up new possibilities emerging from our ever growing content stores. I’m looking forward to examining and discussing this new topic with the Gartner community.