Darin Stewart

A member of the Gartner Blog Network

Darin Stewart
Research Director
1 year with Gartner
15 years IT industry

Darin Stewart is a research director for Gartner in the Collaboration and Content Strategies service. He covers a broad range of technologies that together comprise enterprise content management. Read Full Bio

Coverage Areas:

Schema.org: Webmaster One-Stop or Linked Data Land Grab?

by Darin Stewart  |  June 4, 2011  |  6 Comments

Yesterday, Google, Microsoft and Yahoo! jointly announced schema.org, a new service intended to “create and support a common vocabulary for structured data markup on web pages.”  The idea is to provide a library of vocabularies that can be used in conjunction with the W3C HTML Microdata format to embed machine-readable data into webpages in a manner that can be fully exploited across search engines.  This is being pitched as a breakthrough among the big search engines, namely Google, Bing and Yahoo!  A shared vocabulary should make life simpler for everyone. Developers now only have to deal with one flavor of markup and should have the foundation for richer search functionality in the future.  Search engines know what to expect and how to leverage it.  Users get a slicker and more meaningful result set when they search. My initial reaction is don’t drink the Kool-Aid.

My concern is the announcement’s claim that “the site aims to be a one stop resource for webmasters looking to add markup to their pages.”  This may simplify some coding, but it also locks you into a system that is not under the direction or control of the web community as a whole.  Rather, the vocabularies are driven and controlled by the interests and objectives of a small group of corporate interests.  This has rarely proven to be a good thing for the web. It is also unnecessary.

All of the capabilities promised by schema.org are already fully supported in a richer more scalable manner in the form of RDFa and the Linked Data approach to the Open Web.  As I discussed in an earlier post (When does “Semantic” really mean Semantic?) Linked Data leverages four fundamental principles to provide consistent, machine-readable access to structured (and semi-structured) content on the web.  Schema.org appears to be Linked Data Lite with extremely limited support for vocabularies outside of the service.   It may be more comfortable for webmasters, as the microdata approach keeps things squarely in the HTML world (or at least the HTML5 world). However, that familiarity may come at the cost of flexibility and functionality.  At first brush, schema.org seems like little more than semantic search engine optimization.  I may be proven wrong.  It’s happened before.

Google makes an attempt at being “Big Tent” with the RDFa and microformat camps.  They indicate that RDFa will still work if your markup is “currently supported by rich snippets.”  The implication seems to be that they’ll let you use what’s in place, but if you want to extend it you’re on your own.  There is a subtle air of intimidation throughout the schema.org announcements and documentation.  While not stated overtly, the implication is that if you adopt the microdata approach you will be well treated by their search algorithms.  Those who stick to RDFa and microformats are likely to get lost in the crowd or even pushed to the bottom.  Again, I could just be paranoid, but this is Microsoft and Google we’re talking about. Whatever happened to “do no evil?”

Then there is the concern about competing formats.  Does the schema.org Person definition (http://“schema.org/person”) compete with the RDF Friend of a Friend (“http://xmlns.com/foaf”)?  Support of multiple vocabularies is baked into Linked Data and alternate definitions are nicely accommodated.  Schema.org appears to be of a more jealous sort, demanding exclusivity with your markup.  Again, there appear to be some limited provisions for extending their vocabulary and continuing with your current markup, but it seems to be more for pacification purposes than true integration and interoperability.

This is all a first (admittedly knee-jerk) response to what could potentially be a boon to webmasters and search users.  Microsoft, Google and Yahoo! could have only the best interests of the web in mind and all could turn out just swell.  Things seem to have worked out okay with sitemaps after all. Here’s hoping. It will be interesting to see how the semantic web and search communities react to this development. Next week, I will be attending the Semantic Technologies 2011 conference in San Francisco  where I’m sure Schema.org will be one of the hot topics of conversation.  Tune in next week for reports from the front lines at SemTech2011.

6 Comments »

Category: Semantic Web     Tags: , , , , , ,

6 responses so far ↓

  • 1 Michael Hausenblas   June 4, 2011 at 9:04 am

    FYI: we’ve released a canonical mapping from Schema.org terms to RDF at http://schema.rdfs.org

    Cheers,
    Michael

  • 2 Rob   June 4, 2011 at 11:42 am

    I agree with your criticism, Darin — this looks like linked data lite, with vocabulary control in the hands of the big three.

    But, on the other side of this argument, how might a vocabulary be otherwise controlled and efficiently used by the search engines? I suspect ontology du jour won’t cut it.

    And, is it possible that, with the camel’s nose under the tent in this fashion, something truly inclusive and expansive might be ‘seeded’?

    rjk

  • 3 Kerstin Forsberg   June 4, 2011 at 2:51 pm

    Looking forward to your next blog on Linked Data and Semantic Web. Nice to see blog postings fom Gartner on these important topics for the corporate world.

  • 4 Dominique Guardiola   June 6, 2011 at 8:28 am

    Rob, support of various vocabularies is not about using the “ontologie du jour”, it’s about letting each community make its own ontology about the domain they master.
    Not everything is about tunes, friends, soda and “likes”, no, we need an adaptative and open framework for the upcoming web of data.

  • 5 Chris Graham   June 6, 2011 at 1:27 pm

    We’re very happy with it at ocPortal CMS (http://ocportal.com/site/news/view/new-releases/ocportal-71-beta1.htm), having just released an implementation.

    Your points may be valid on an intellectual level Darin, but perfect is the enemy of good, and the reality is RDF is just far too complex and bloated to be integrated without a serious engineering effort. Therefore it has never reached a critical mass.

    Ideally this would be done through the W3C, and maybe they’ll pick it up at some point, but at least it’s done by a group of competing companies that are incentivised to move the web forward. And these aren’t any companies, they are serious stakeholders capable of giving this momentum and incentivising webmasters. It is similar to how WHATWG took over HTML development, before it was passed over to the W3C as HTML5 – the browser manufacturers had to inject realism in the process, and then standardisation followed. The same is true for Javascript (which became ECMAScript).

    Both WHATWG’s actions, and HTML5 Microdata, concerned me when I heard about them at the time, and I posted public comments condemning them – but I’ve been won over on both accounts.

    In summary, simplicity is important in standards that have to be implemented by the masses, and extra use cases like neutrality and decentralisation tend to add so much complexity, and remove so much authority and clarity, that things don’t pick up as they need to.

  • 6 Schema.org: Webmaster One-Stop or Linked Data Land Grab? | whiteblogger.co.cc   June 7, 2011 at 2:04 am

    [...] Schema.org: Webmaster One-Stop or Linked Data Land Grab? This entry was posted in Uncategorized, webmaster and tagged darin, darin-stewart, [...]