Darin Stewart

A member of the Gartner Blog Network

Darin Stewart
Research Director
1 year with Gartner
15 years IT industry

Darin Stewart is a research director for Gartner in the Collaboration and Content Strategies service. He covers a broad range of technologies that together comprise enterprise content management. Read Full Bio

Coverage Areas:

Big Content Needs More Metadata

by Darin Stewart  |  May 15, 2013  |  1 Comment

In recent posts I’ve introduced the notion of Big Content as shorthand for incorporating unstructured content into the Big Data world in a systematic and strategic way. Big Data changes the way we think about content and how we manage it. One of the most important areas requiring a fresh look is metadata. Big Content expands the definition of metadata beyond the traditional tombstone information normally associated with documents (title, author, creation date, archive date, etc.). While these elements are necessary and remain foundational to both effective content management and Big Data, more is required. Big Content metadata encompasses any additional illuminating information that can be extracted from or applied to the source content to facilitate its integration and analysis with other information from across the enterprise. This expanded definition results in a three-tiered metadata architecture for Big Content.

Metadata Stack

At the bottom level of the architecture, a core enterprise metadata framework provides a small set of metadata elements that are applicable to the majority of enterprise information assets under management. These elements are often drawn from a well known standard set of elements such as the Dublin Core but can include whatever common elements are useful to the enterprise. This common framework provides the unifying thread that will facilitate locating content from across the enterprise, making an initial assessment of its relevance and of submitting it to the content ingestion pipeline.

The second layer of the Big Content metadata architecture consists of domain specific elements that are not necessarily applicable to all enterprise content, but are useful to a particular area such as a brand, product or department. At this level, common metadata often exists under different labels depending on where it is created and which department owns it. This increases its value and utility to that department but makes it more difficult to leverage for content integration and analysis. To reconcile domain metadata it is often necessary to create a metadata map that resolves naming and semantic conflicts.

The top layer of the metadata architecture consists of application specific metadata. This is additional information about content that is only relevant to the use-case at hand and the application facilitating its execution. As such it is not created or stored in the content management systems hosting the source content. It is created solely for the purpose of structuring and augmenting the content to be utilized within a vertical application in the Big Content environment.

Throughout the entire Big Content lifecycle ensuring metadata quality and integrity is of the highest importance. Quality measures must go beyond simply reconciling field names. It is important that the steps taken to enrich and refine content are applied consistently. If some dates are not normalized, entity extraction is incomplete, or terminology is not reconciled, the accuracy of the data behind the insights comes into doubt. As a result any analysis and its findings become questionable. Metadata represents a significant upfront investment and ongoing requirement when large amounts of content are involved. Never the less, it is a critical factor in effective content management and the key enabler of the Big Content ecosystem.

1 Comment »

Category: Big Content Enterprise Content Managment metadata     Tags: , , ,

1 response so far ↓

  • 1 Righteous Mind @ Social   May 16, 2013 at 2:43 am

    Darin according to me big content need not too much metadata. but its need quality meta data so that it will help to understand.