In a previous post I raised, amongst others, the issue of authenticity and quality of open government data. Yesterday, this came up in an interview appeared on the O’Reilly Radar, to Raymond Mosley, Director of the Office of the Federal Register (OFR) and Michael L. Wash, CIO of the Government Printing Office (GPO).
The actual news was the announcement of a freely available Federal Register in XML format. For non-US readers, the Federal Register is “a description of the Executive branch’s doings, including 150 daily policy decisions of President and Federal agencies, such as proposed and enacted changes to federal regulations”. This is clearly an important step in the open data journey that the Obama administration is pursuing.
While most of the interview focuses on the significance of this and how the conversion from SGML to XML was achieved, there is one particular passage that caught my attention:
Question: There’s a lot of concern about authenticity, particularly from groups like law librarians. Mike, can you talk about digital signatures and other things you have in place to make sure you’re looking at the real deal when you see an official journal? What happens when copies of this stuff get made …. is there anyway to see that you’re not looking at a Bogus Register?
Mike Wash: The XML is not digitally signed. The Office of the Federal Register is working with Data.gov to enhance the language on Data.gov to clearly indicate that the XML is not signed. New language is being added to the Federal Register pages on Data.gov that will read as follows:
“The current XML data set is not yet an official format of the Federal Register. Only the PDF and Text versions have legal status as parts of the official online format of the Federal Register. The XML-structured files are derived from SGML-tagged data and printing codes, which may produce anomalies in display. In addition, the XML data does not yet include image files. Users who require a higher level of assurance may wish to consult the official version of the Federal Register on FDsys.gov The FDsys data set includes digitally signed Federal Register PDF files, which may be relied upon as evidence in a court of law. [See: http://www.fdsys.gov/fdsys/browse/collection.action?collectionCode=FR ]
Our XML user guide explains that we may digitally sign XML files in the future, but for now we are still concentrating on enhancing the display and content of XML files. We require complete assurance that the XML product is a true rendition of the FR official legal record before proceeding with digital signatures. As the official publisher, data integrity is paramount. For us, the equation is: digital signature = authentic official edition.
This is an interesting approach. In order for data to be really open, they are made available without a signature but with a disclaimer. Those who wish to use them up know that those may not be entirely accurate, but what about those in the next step of the “mash-up chain”? Will they care or even be able to detect whether all federal register data used in a particular mash-up are digitally signed? As government data gets aggregated and mixed with external data, will people check all data sources, especially in a world where we get to trust the advice of people we don’t even know (through product and service ratings or reviews)?
Although automated tools will help detect some cases, social monitoring will probably remain the most powerful tool to effectively manage mashers’ reputation and defend consumers from inaccurate or fraudulent representation and integration of information. This is the same mechanism that is used today to flag inappropriate content on social media (e.g. by clicking on the “Report abuse” button).
As open government data will be used and mashed up on a multitude of web sites and social media, government organizations will need to establish or coordinate (and actively socialize) abuse reporting resources that will expose data about mashers’ reputation as well as trigger investigations in case of criminal behavior.