It is a great irony of the Semantic Web, which is predicated on the notion of explicit and unambiguous meaning, that no one can quite agree on what we mean by “semantic.” The fallback position is to simply point at the technology stack defined by W3C and say that anything taking advantage of those tools is “the semantic web” or at least a part of it. While this may be valid, as far as it goes, it also misses the point.
I prefer to draw a distinction between “semantic technologies” and the “semantic web”. Narrowly defined, semantic technologies are a family of W3C sanctioned standards and tools that play nicely together to create meaningful relationships between disparate online resources (data, people, anything of use) rather than just documents. They do so in a manner that both machines and people can ingest and interpret without too much confusion. More broadly defined, a semantic technology is anything that makes meaning and relationships explicit. This could be a taxonomy or thesaurus, advanced metadata, automatic classification, entity extraction, this list goes on. Any of these technologies can be used behind the firewall in isolation from the broader web and still bring value to the enterprise. This is not, however, the semantic web.
The semantic web augments and extends the world wide web and so must be a part of that greater web of information and resources. The secret sauce here is the underlying information consumed by semantic technologies. Without access to properly structured and documented (read: lots and lots of metadata) public information, the smartest applications we can build will be little more than idiot savants, very good in their own domain but unable to function in the world at large. It is these smart applications, well-fed with a diverse diet of palatable information that constitute the true semantic web. The particular technologies employed are more of an implementation issue rather than a fundamental property. They are a means to an end rather than the end in itself.
So how do we get there? Fortunately, we are well on our way by means of three concurrent and complimentary movements: open data, linked data and the semantic web proper.
The Open Data Movement posits that certain (if not most) data and information should be freely available. Much of this is an outgrowth of requirements for publicly funded research. If the people paid for it, they should have access to it. As a result, many researchers must publish their data sets in public repositories as a condition of receiving federal dollars. This practice is starting to move beyond the academy as private enterprises realize that by sharing data, they can benefit from the creativity and insight of people not on their payroll. In essence, they are saying “Here’s a bunch of data. Let’s see you do something cool with it.” The problem is that there is little agreement on how the data should be shared. Standards may be followed within a particular community of practice, but true innovation happens when someone from outside the domain brings their expertise to bear. The lack of standardization often presents too high a barrier for this to happen. This is where linked data comes in.
Linked data takes open data a step (actually four steps) further by articulating four fundamental principles for publishing data. In short, (1&2) name things with HTTP URIs. This provides a well understood mechanism for uniquely identifying resources in a manner that can be easily located. (3) When someone does look up that resource, provide useful information in a standardized way. In other words use RDF to provide a common data model and representation. Finally (4) link your resources to other resources so your users, be they human or otherwise, can find related things. As of November of last year, the State of the LOD Cloud report documented nearly 27 Billion triples and nearly 400 Million RDF links that meet these criteria. When compared to the size of the general web, this may seem tiny but considering this has only emerged over the past couple of years, its rate of growth is impressive. If this growth continues, and there is ever expectation that it will, indeed that it is likely to accelerate, the substrate of the semantic web is well on its way.
Which brings us full circle to the semantic web proper. The most extensive, highly linked, well structured, data set is useless if there is nothing to consume it. It is the community of smart applications that utilize the web of linked data that truly comprise the semantic web. By adopting a common data model (RDF) and adhering to the standards, it becomes possible to create applications that can utilize resources (not just documents) across the entire web of data and to interact with each other in a consistent and intelligent manner. Further, because of the inference capabilities fostered by these standards and the adoption of well crafted ontologies, it becomes possible for these applications to act on information that does not explicitly exist anywhere in the web. Just as it’s the people, not the plumbing, that make a community, it’s the applications, not the data, that constitutes the semantic web.
We are in the early days of each of these three initiatives, but as I said, they are growing and the pace of growth is accelerating. We may not get to Sir Tim Berners-Lee’s original vision of semi-sentient agents roaming the web freeing us from the mundane chores of daily life in the information age for some time, but we are seeing the practical benefits of linked open data today.