In PCAST Opportunity: Documents vs. “Atomic Data Elements” I speculated that the following language from the PCAST report suggested a need for more fine-grained (atomic or molecular) definitions standard clinical data.
“We think that a universal exchange language must facilitate the exchange of metadata tagged elements at a more atomic and disaggregated level [than CDA documents] ….”
Many others interpreted it as a general rejection of the notion of documents as an important point in the clinical workflow, where a specific physician’s view of the case provided a collection of relevant information and the context to interpret them.
A Revised View
In the public hearings of the PCAST Work Group on 15-16 February we learned that the PCAST perceives the UEL in a manner that is quite different than I had imagined. Its view is more far-reaching. It embraces documents but does not conclude that they are the only way in which clinical information will be communicated or used. With this new view, I like what I see.
In order to explain why, I need to make some observations about our progress in healthcare IT standards and the advance of technology.
In my 30 years of working on health IT standards I usually had the assumption that if we once got clinical standards right the industry would coalesce around them and semantic interoperability would arise. Standards work would settle down to a maintenance activity.
In 30 years we have gotten a lot of things headed in the right direction. The HL7 RIM is a step forward in understanding underlying similarities in data that can simplify designs. The RIM demonstrated its flexibility by being able to absorb genomics without a redesign. However its abstraction makes it difficult to use in many practical situations.
SNOMED-CT has gone from an important idea that was inaccessible for fears of economic exploitation to a widely embraced standard. IHTSDO is now reportedly working on improvements that are needed, such as the ability to compare two IHTSDO expressions and decide whether they mean the same thing. Although these improvements may not invalidate existing codes, they may lead to signficant changes in the working set of codes that are used to encode information.
The industry has come to recognize that code sets and data structures are not independent concepts. One can’t create an arbitrary XML syntax and “plug in” SNOMED. One must craft the syntax concept by concept to the inherent structure in SNOMED. HL7 and a half-dozen other groups are working on ways of matching the structure of clinical data to SNOMED and other code sets. Each group other than HL7 sees that a primary challenge is having a short-hand expression that enables efficient interaction with clinical subject matter experts on the combinations of codes and values that make sense clinically.
But I have to admit we are not nearly done yet. In fact I doubt that we will ever be “done,” although, although the progress I describe is closing in on providing a basis for willing systems to interoperate when it is in their interests to do so.
Those same 30 years have seen changed in technology occurring much more rapidly that Health IT standards progress. Some technologies are common now that were esoteric when we started HL7 or weren’t imagined. These include Ethernet, the Internet, the Web, widespread use of graphics, search engines, imaging as a commodity, a free Web service that identifies photos of your friend and family by facial recognition, and smart phone services that let me search the web by speaking into the phone or taking a picture.
One EHR vendor offers its clients a “semantic search” capability that enables sophisticated search over their structured and text data. Because this service is offered using on cloud technology that, it could search across its multiple clients’ data if appropriate policies and governance were in place.
With recent advances in image recognition, it is feasible easy to believe that future searches on coronary ischemia might include a patient’s record because the record contain an image of an EKG with inverted t-waves. (I am not offering that this is a medically appropriate example, just that having a search engine make inferences about images that seem to contain EKGs with abnormalities seem quite possible.)
The last two examples, semantic search including natural language processing and semantic search based on EKG images, exemplify a principle that turns the view of most standards work upside down: the semantic specificity with which information is offered is not the upper bound on how well the information may be interpreted. Users of the information (such as search engines) may actually add value by advanced interpretation of the information.
It is always better to offer information with as much semantic precision as possible, but it would be a mistake to prohibit offering information because its semantic precision — or its method of expressing its semantics — was not consistent with the standards used by the receivers.
The Realities of Standardization
I now perceive these ideas as realities:
- Standardization is a never-ending process.
- New standards will come into being and old ones will evolve. We already use three families of HL7, along with DICOM, NCPDP and X12 in US healthcare. Each of these standards has multiple releases.
- New versions of each standards will come into being as long as the standard has not reached perfection (i.e., forever).
- Each new standard will be interpreted differently creating “subspecies” for specific countries, specialties or other iconoclastic groupings.
- For many years plain text (and new multimedia formats not specific to healthcare) will continue to convey important healthcare data.
- The ability of source systems to produce standard output will evolve over time. The source systems will revise their information models and functions to product standard data. We can hope that certification and more experience interoperating on a day-to-day basis will diminish the number of idiosyncratic interpretations of standards that systems produce, but it would be unrealistic to expect to drive all error out of the ultra large system that is networked health IT.
- All versions of all subspecies of all standards (and all idiosyncratic interpretations) will live forever in data and documents that have been encoded at some point in time and are available to a data element access service (DEAS).
- The ability of systems that index such data and those that use the data will also evolve over time. Often the user systems may be technological generations ahead of the systems that first produced the data.
- In terms of indexing and reuse, no data is bad data, although data that is structured and coded is clearly more useful than the same information as a blob of text or an audio recording of spoken words.
- These days often the data that remains the data that remains is the important after five or 10 years is still being created in text (e.g. op-notes, discharge summaries, pathologist’s reports). It would be a shame to say all data created in anything less than a perfect structured format is lost to indexing and reuse.
Reinterpreting the PCAST Report
It appears that recognizing these realities, the PCAST is seeking a UEL that meets the data where it is by providing a minimal container that makes the data available for indexing and reuse across generations of standards, technologies and the functional capability of source system. It creates no barriers for smart indexing programs to index HL7 CDA documents or, for that matter, plain text, PDFs of images of EKG strips. It also makes no presupposition that only certain ways of structuring data will be useful.
Metadata: Like Email and MIME Headers
Think of an instance expressed in the UEL as being like an email message. The e-mail wrapper describes messages with metadata that contains the important information to convey the message electronically but says little else. Where MIME is used inside the e-mail it supports a wide variety of data formats but gives no support in decoding them other than identifying the format used in a section.
These formats for metadata in email and MIME headers are standard and correspond to the UEL.
The metadata that a UEL instance would carry to support the DEAS includes
- Patient identity (where applicable; not in the form of a national ID within the US)
- Patient consent declarations (supported by a specific cryptographic approach)
- Provenance (where available), to answer the question “if this data was copied, computed or abstracted from other data, where is the source data?”
- Perhaps, like MIME, a UEL instance would carry some broad hint about the data format. More likely it would simply use MIME for this purpose.
Atom or Bucket?
If I am right, this interpretation is hard to reconcile with the use of the word “atomic” in the cited language. A UEL element seems more like a bucket, a container that can hold anything. Documents, HL7 v2 messages, structured data retrieved from a database, text snippets, images. The list goes on and will expand with technology.
Documents are Important, Too
If my interpretation is correct, the UEL is about enabling access across the widest variety of formats. Those who find UEL-packaged documents get more context information than those that get other kinds of data. For example, it may be possible for the computer or a physician to recognize that flattened t-waves do not actually indicate ischemia because of the patients’ medications.
But let’s face it, pulling data out of context for interpretation is a standard feature of clinical information processing. EHRs often extract individual items of data from incoming reports (such as problems or meds) and deal with them specially. For example they may produce flow-sheet like tables on selected data or graph trends in numerical observations. In a well-crafted UI the user should be no more one click away from seeing the document so they can make judgments about context. In UEL terms the provenance link enables such a quick reference.
In analogous situations using data packaged in UEL buckets the provenance affords a capability similar to clicking on tend information in an EHR — tracing back to the information source to support cognition around context.
This post is not meant to endorse the PCAST Report in its entirety. There are many other parts of it that are difficult to understand and I have substantial questions that the privacy approach should be implemented.
At the same time I did write the post to praise the concept of the UEL (as I now think I understand it). The UEL takes into account the continuously evolving heterogeneity of systems, technology and semantics and constitutes a thin data layer to assist in basic operations in the proud tradition of MIME and e-mail headers. I honestly don’t think many people will get this from reading the report; I have had the benefit of a couple of presentations from the authors.
It seems that the statement that the CDA is not a suitable candidate for the UEL is now obvious and is not particularly a criticism of the CDA. The CDA serves an entirely different purpose: increasing the semantic specificity of documents.
In fact nothing in the notion of the UEL argues against working full-tilt to nail down semantics and increase the number of source systems that are consistently applying common semantic coding. It does very much argue against making that effort a prerequisite for interoperable use of healthcare information.
Note: this was updated approximately 2330 GMT-5 on 19 Feb 2011. The prior version was a draft not intended for publication.