Gartner Blog Network

Does XML Schema Earn its Keep?

by Wes Rishel  |  December 28, 2011  |  6 Comments

Keith Boone continues his campaign to make V3 comprehensible with an excellent post on ordering in XML schema and an idea that could overcome one of the fundamental flaws in the “extensible markup language” — the requirement for all parties to switch simultaneously to a new schema version in order to extend the schema. I hope HL7 as a group will consider his idea because the ability to do unsynchronized upgrades is critical to the roll-out of any standard at large scale.

BTW, we had that problem knocked in 1987 with HL7 version 2 but we lost ground going to XML Schema in V3.

Keith’s post made me wonder, does HL7 (or any application standards effort) really get enough bang for the buck to justify using XML schema at all? While XML schema does nail down some structural issues it  has not proven effective as the sole method of validating HL7 message content. It has to be supplemented by xpath-based rules to begin to validate the content.

Switching to simply using well-formed XML with xPath-based validation would (a) make extensibility easier and (b) open up the possibility of evolving to JSON.

The XML v. JSON example below shows the difference. (It is not HL7 XML syntax, just illustrative.)

In theory, JSON is about as powerful as well-formed XML. Some differences between JSON and XML are:

  1. No concept of attributes as distinct from elements, which has always been a word-class, time wasting distinction without a difference in XML.
  2. No use of unneeded and repetitious tags to close elements. Closing tags was of vestigal value when people created SGML by hand, but does nothing but take up space in XML.

Does the concise nature of JSON add value? You can pretty much get into a fight in any HL7 biker bar by raising the issue of concision.

Those who argue for it say that it gets you the value of the XML schema language, which is as important as you think Schema is valuable. They also argue that the extra characters don’t matter much in these days of cheap disk storage, high network bandwidth and enough processor speed to fuel the extra parsing overhead.

The pro-concision folks argue primarily that it makes a difference to people who ultimately have to look at instances in order to develop and debug code. It also matters when working with subject matter experts, who stubbornly want to look at instance examples to understand what they are being asked. Secondarily, they add that when you look at the pragmatic issues around using V3 for high-volume messaging the overhead in XML (and V3’s use of XML) is a pragmatic problem in the short-term, which is the term in which people actually build and use interfaces.

I had a religious conversion in the mid-90s and went from indifferent to concision to pro-concision. It came when I was working with SMEs in the insurance industry.

JSON (white space outside of quotes is completely ad lib.

{"person" :
  {"firstName": "John",
   "lastName" : "Smith",
   "age"      : 25,
   "address"  :
     {"streetAddress": "21 2nd Street",
      "city"         : "New York",
      "state"        : "NY",
      "postalCode"   : "10021"},
     [ {"type"  : "home",
        "number": "212 555-1234"},
       {"type"  : "fax",
        "number": "646 555-4567"}

Two Styles of XML: the second example make more use of attributes and is therefore less extensible. White space outside of quotes is not entirely ad lib.
<firstName>John</firstName>  <lastName>Smith</lastName>
    <streetAddress>21 2nd Street</streetAddress>
    <city>New York</city>
  <phoneNumber type="home">212 555-1234</phoneNumber>
  <phoneNumber type="fax">646 555-4567</phoneNumber>
<person firstName="John" lastName="Smith" age="25">   
  <address streetAddress="21 2nd Street" city="New York" state="NY" postalCode="10021" />   
  <phoneNumber type="home" number="212 555-1234"/>  
  <phoneNumber type="fax"  number="646 555-4567"/> 

Updated 12/29 to correct omission of JSON example in original post.

Category: healthcare-providers  interoperability  vertical-industries  

Tags: health-information-exchange  healthcare-interoperability  hie  hl7  json  xml  xml-schema  

Thoughts on Does XML Schema Earn its Keep?

  1. Grahame Grieve says:

    Hi Wes

    You have raised several different issues here – schema or not, XML or not, and atttributes or not. Generally there is a bigger problem with XML around why it is being used, I’ve made my own comment here:

    There is a large subset of hl7 implementers who depend on schema. If we didn’t provide schema, they’d do their best not to use all. In fact, many adopters only ever use the Schema.

    I think that what this shows is that schemas should be optimized for Code generation not validation. Note that schematron is also quite limited in what it can validate, though not as limited as schema.

    JSON. Is interesting – but it’s so close to XML. It seems to me that it’s real attraction is that it’s the devil we don’t know -yet

  2. You mention here that you will have an XML versus JSON example, but you instead have an XML-elements versus XML-attributes example?

    Something not quite right here. :-)

  3. Wes Rishel says:

    Well,duh. Thanks for the quick catch. I just corrected it.

  4. Wes Rishel says:

    Thanks, Graham. I don’t really understand how someone
    only uses schema, but I suppose it is to create some derivation on the intellectual property in V3 standards.

    I would argue that JSON has a manifest advantage over XML in concision; that it is inherently more extensible for not having attributes; and that the closeness to XML is largely an advantage in that they are both recursive notations that support composite elements.

    It is appropriate to classify JSON as the “devil we don’t know” in that there may be hidden implications of using it. We should be careful to underline the “we” in “the devil we don’t know because it does seem to be gaining ground in other communities.

  5. Grahame Grieve says:

    Right, JSON holds some real attraction. It’s especially wonderful in JavaScript with jQuery in play too. But JSON schema is a curio. What does that mean? Is only one of my communities (ref link above) using it (I rather think so). Does using JSON disenfranchise the few happy campers we have left? Note that I did a draft JSON ITS for v2 and v3, but no one seemed interested.

    Why is HL7 at blame for what is clearly an industry problem? ;-)

  6. Thomas Beale says:


    XML Schema is a horrible badly designed formalism, and there is a huge mismatch between its semantics and object models and data. Specifically inheritance (if you can call it that) is completely broken, which renders it useless for expressing ‘models’. See this article by James Clarke – .

    I just went through the JSON spec and implemented it for a serialiser for the ADL archetype language. it is woefully under-designed – and cannot in its current form be used for realistic data. This is because it carries no typing information, along with some other problems. YAML is probably the formalism that you really want. It’s ugly, but seems powerful enough to represent realistic object data.

    See this page for a fun comparison of some of these formats.

    We won’t escape XSD for some years yet, and while we are stuck with it, it should be used not for authoring any kind of model, but generated by tools from proper modelling stacks, and only used for one purpose: valdating data. That means it doesn’t matter how ugly or non-normalised the schema is, as long as it a) conforms to the originating model and b) validates the data correctly and maybe c) defines space-efficient data. In openEHR we have our own formalism that represents object data beautifully, but of course it will never catch on ;-)

Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.