The interest in Big Data at our recent Catalyst conference shows that enterprises have recognized the need for a new approach to exploiting massive and rapidly changing data streams. When will that same interest coalesce for Big Content?
Big Content is a term that helps highlight the subset of Big Data related to the less-structured side of it. Big Content isn’t new or different than Big Data; rather it helps focus on uses of Big Data for unstructured information for the kind of folks that think the Library of Congress is filled with “content”, not “data.”
After all, Big Data has much to offer to folks who are turned off by the word “data” and may pay more attention to its potential value if a subset of its techniques are thought of as Big Content. Just as Big Data uses Apache Hadoop (with MapReduce) to go beyond traditional BI, Big Content combines technologies to go beyond traditional search. These technologies are applied to text analytics, sentiment analysis, video analysis, semantic web technologies, and attention management.
The Big Data story is now well known. Whether you’re analyzing real-time point of sale information from grocery stores, traffic sensors for every corner in downtown, or tracking temperature and flow speed from myriad points of the ocean, numerical data is flowing in faster than online transaction processing systems (OLTP) can handle them. This is where Big Data comes in and has revitalized the kinds of people that have utilized BI and OLAP.
But what the audience that cares about less structured information? Unstructured content such as social media postings, audio, and video are growing at a fast clip. And in practice, structured or unstructured is not a binary choice. Numbers in a database are clearly structured and a freeform Word document is unstructured, but there are many shades of gray inbetween such as web logs, XML-based comment (with varying levels of specificity in their schema), web logs, text (or documents) in database fields, and structured Word documents. Blends of structured data and unstructured content can yield interesting hybrid analytics use cases.
Out of this mass of content, enterprises increasingly want answers to questions such as:
- What are my customers saying about my product in social media? Are the reactions generally favorable or not?
- How often have epidemiological studies shown a certain protein to be an inhibitor?
- How many articles about the deficit mention healthcare entitlements?
- How can I notice important trends in my field of expertise that are beyond a Google search?
Full-text search is not the answer. There may be too much noise in the search results to make them useful. The results may be desired as semantic linkages or sentiment ratings rather than a list of links. The text to be searched may not be accessible by a public search engine like Google or all within a firewall for enterprise search engines.
For industries that care more about what people are saying rather than what meters are measuring, Big Content will become a big deal.
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.