Darin Stewart

A member of the Gartner Blog Network

Darin Stewart
Research Director
1 year with Gartner
15 years IT industry

Darin Stewart is a research director for Gartner in the Collaboration and Content Strategies service. He covers a broad range of technologies that together comprise enterprise content management. Read Full Bio

Coverage Areas:

When does “Semantic” really mean Semantic?

by Darin Stewart  |  March 11, 2011  |  4 Comments

It is a great irony of the Semantic Web, which is predicated on the notion of  explicit and unambiguous meaning, that no one can quite agree on what we mean by “semantic.” The fallback position is to simply point at the technology stack defined by W3C and say that anything taking advantage of those tools is “the semantic web” or at least a part of it.  While this may be valid, as far as it goes, it also misses the point.

I prefer to draw a distinction between “semantic technologies” and the “semantic web”. Narrowly defined, semantic technologies are a family of W3C sanctioned standards and tools that play nicely together to create meaningful relationships between disparate online resources (data, people, anything of use) rather than just documents. They do so in a manner that both machines and people can ingest and interpret without too much confusion. More broadly defined, a semantic technology is anything that makes meaning and relationships explicit.  This could be a taxonomy or thesaurus, advanced metadata, automatic classification, entity extraction, this list goes on. Any of these technologies can be used behind the firewall in isolation from the broader web and still bring value to the enterprise. This is not, however, the semantic web.

The semantic web augments and extends the world wide web and so must be a part of that greater web of information and resources.  The secret sauce here is the underlying information consumed by semantic technologies.  Without access to properly structured and documented  (read: lots and lots of metadata) public information, the smartest applications we can build will be little more than idiot savants, very good in their own domain but unable to function in the world at large. It is these smart applications, well-fed with a diverse diet of palatable information that constitute the true semantic web. The particular technologies employed are more of an implementation issue rather than a fundamental property. They are a means to an end rather than the end in itself.

So how do we get there?  Fortunately, we are well on our way by means of three concurrent and complimentary movements: open data, linked data and the semantic web proper.

The Open Data Movement posits that certain (if not most) data and information should be freely available.  Much of this is an outgrowth of requirements for publicly funded research.  If the people paid for it, they should have access to it.  As a result, many researchers must publish their data sets in public repositories as a condition of receiving federal dollars.  This practice is starting to move beyond the academy as private enterprises realize that by sharing data, they can benefit from the creativity and insight of people not on their payroll.  In essence, they are saying “Here’s a bunch of data.  Let’s see you do something cool with it.”  The problem is that there is little agreement on how the data should be shared.  Standards may be followed within a particular community of practice, but true innovation happens when someone from outside the domain brings their expertise to bear.  The lack of standardization often presents too high a barrier for this to happen.  This is where linked data comes in.

Linked data takes open data a step (actually four steps) further by articulating four fundamental principles for publishing data.  In short,  (1&2) name things with HTTP URIs. This provides a well understood mechanism for uniquely identifying resources in a manner that can be easily located. (3) When someone does look up that resource, provide useful information in a standardized way. In other words use RDF to provide a common data model and representation.  Finally (4) link your resources to other resources so your users, be they human or otherwise, can find related things. As of November of last year, the  State of the LOD Cloud report documented nearly 27 Billion triples and nearly 400 Million RDF links that meet these criteria.  When compared to the size of the general web, this may seem tiny but considering this has only emerged over the past couple of years, its rate of growth is impressive. If this growth continues, and there is ever expectation that it will, indeed that it is likely to accelerate, the substrate of the semantic web is well on its way.

Which brings us full circle to the semantic web proper.  The most extensive, highly linked, well structured, data set is useless if there is nothing to consume it. It is the community of smart applications that utilize the web of linked data that truly comprise the semantic web.  By adopting a common data model (RDF) and adhering to the standards, it becomes possible to create applications that can utilize resources (not just documents) across the entire web of data and to interact with each other in a consistent and intelligent manner.  Further, because of the inference capabilities fostered by these standards and the adoption of well crafted ontologies, it becomes possible for these applications to act on information that does not explicitly exist anywhere in the web.  Just as it’s the people, not the plumbing, that make a community, it’s the applications, not the data, that constitutes the semantic web.

We are in the early days of each of these three initiatives, but as I said, they are growing and the pace of growth is accelerating. We may not get to Sir Tim Berners-Lee’s original vision of semi-sentient agents roaming the web freeing us from the mundane chores of daily life in the information age for some time, but we are seeing the practical benefits of linked open data today. 

4 Comments »

Category: Semantic Web     Tags: , ,

4 responses so far ↓

  • 1 Vidar Langberget   March 13, 2011 at 8:57 pm

    Good post!

    You make a disctinction about semantic technologies and the semantic web, but it gives you a problem to define what semantic technologies is: is it the semantic standard from W3C, or is it the a broader definition involving “anything that makes meaning and relationships explicit”.

    We prefer to divide it up into three separate terms:
    Semantic Web – web of data that is understandable by machines.
    Semantic Web technologies – standards about the semantic web
    Semantic technologies – “anything that makes meaning and relationships explicit”

    We also see a huge growth in interest about the semantic web and semantic technologies.

  • 2 Darin Stewart   March 13, 2011 at 9:50 pm

    I think of it as capital “S” Semantics – the formal W3C language / technology stack and lower case “s” semantics – the broader category of meaning-based technologies. They are definitely overlapping, one supporting the other.

  • 3 Bart Gajderowicz   March 15, 2011 at 4:47 am

    Great post, Darin.

    Much of the confusion about the Semantic Web stems from the very fact that so many technologies are placed in that stack. Each technology is meant to fullfil a specific need, and only some of the technologies are meant to work together.

    Also, I’m glad you brought up the Open Data movement because it is too often overlooked when discussing semantics. I personally think the semantics play a larger role in Open Data than simply linking different definitions and resources via linked data. Semantics can be used to help analyze open datasets housed in repositories like buzzdata.com. Enhancing datasets with semantic annotations would help in making complex structures more accessible by people who may have a unique understanding of a domain, and can apply different data-mining techniques.

    By adding the flexibility of triples, the structures of flat or relational datasets could be adopted more easily by a variety of novice or expert analysts. A set of slides by Toby Segaran on using triples in this way can be found here. http://kiwitobes.com/presentations/SegaranCSHALS2011.pdf

    Thanks

  • 4 Jakob   March 16, 2011 at 7:58 am

    So structured data is the new semantic and usage is the new meaning. I’d add automatic is the new intelligent. I don’t see the complex technology stack as main reason of the confusion around “semantic”, but the AI folks that spoiled the terminology instead of building upon established terms from the database community. It’s all about data and its use as 50 years ago – the rest ist mainly bullshit to sell old ideas as new.