Yesterday, Google, Microsoft and Yahoo! jointly announced schema.org, a new service intended to “create and support a common vocabulary for structured data markup on web pages.” The idea is to provide a library of vocabularies that can be used in conjunction with the W3C HTML Microdata format to embed machine-readable data into webpages in a manner that can be fully exploited across search engines. This is being pitched as a breakthrough among the big search engines, namely Google, Bing and Yahoo! A shared vocabulary should make life simpler for everyone. Developers now only have to deal with one flavor of markup and should have the foundation for richer search functionality in the future. Search engines know what to expect and how to leverage it. Users get a slicker and more meaningful result set when they search. My initial reaction is don’t drink the Kool-Aid.
My concern is the announcement’s claim that “the site aims to be a one stop resource for webmasters looking to add markup to their pages.” This may simplify some coding, but it also locks you into a system that is not under the direction or control of the web community as a whole. Rather, the vocabularies are driven and controlled by the interests and objectives of a small group of corporate interests. This has rarely proven to be a good thing for the web. It is also unnecessary.
All of the capabilities promised by schema.org are already fully supported in a richer more scalable manner in the form of RDFa and the Linked Data approach to the Open Web. As I discussed in an earlier post (When does “Semantic” really mean Semantic?) Linked Data leverages four fundamental principles to provide consistent, machine-readable access to structured (and semi-structured) content on the web. Schema.org appears to be Linked Data Lite with extremely limited support for vocabularies outside of the service. It may be more comfortable for webmasters, as the microdata approach keeps things squarely in the HTML world (or at least the HTML5 world). However, that familiarity may come at the cost of flexibility and functionality. At first brush, schema.org seems like little more than semantic search engine optimization. I may be proven wrong. It’s happened before.
Google makes an attempt at being “Big Tent” with the RDFa and microformat camps. They indicate that RDFa will still work if your markup is “currently supported by rich snippets.” The implication seems to be that they’ll let you use what’s in place, but if you want to extend it you’re on your own. There is a subtle air of intimidation throughout the schema.org announcements and documentation. While not stated overtly, the implication is that if you adopt the microdata approach you will be well treated by their search algorithms. Those who stick to RDFa and microformats are likely to get lost in the crowd or even pushed to the bottom. Again, I could just be paranoid, but this is Microsoft and Google we’re talking about. Whatever happened to “do no evil?”
Then there is the concern about competing formats. Does the schema.org Person definition (http://“schema.org/person”) compete with the RDF Friend of a Friend (“http://xmlns.com/foaf”)? Support of multiple vocabularies is baked into Linked Data and alternate definitions are nicely accommodated. Schema.org appears to be of a more jealous sort, demanding exclusivity with your markup. Again, there appear to be some limited provisions for extending their vocabulary and continuing with your current markup, but it seems to be more for pacification purposes than true integration and interoperability.
This is all a first (admittedly knee-jerk) response to what could potentially be a boon to webmasters and search users. Microsoft, Google and Yahoo! could have only the best interests of the web in mind and all could turn out just swell. Things seem to have worked out okay with sitemaps after all. Here’s hoping. It will be interesting to see how the semantic web and search communities react to this development. Next week, I will be attending the Semantic Technologies 2011 conference in San Francisco where I’m sure Schema.org will be one of the hot topics of conversation. Tune in next week for reports from the front lines at SemTech2011.
Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.