On May 9, after a longer-than-expected preparation, the Open Data Policy announced as part of the US Digital Government Strategy has been issued together with an executive order signed by President Obama about Making Open and Machine Readable the New Default for Government Information.
As one reads the order, browses through the first few pages of the policy or watches the short video that CIO Steve Van Roekel and CTO Todd Park released to explain the policy, the first impression is that this is just the reinforcement of prior open government policies. The order is quite explicit in saying that (emphasis is mine):
The default state of new and modernized Government information resources shall be open and machine readable. Government information shall be managed as an asset throughout its life cycle to promote interoperability and openness, and, wherever possible and legally permissible, to ensure that data are released to the public in ways that make the data easy to find, accessible, and usable. In making this the new default state, executive departments and agencies (agencies) shall ensure that they safeguard individual privacy, confidentiality, and national security
Looking at the definition of open data in the policy itself, the first attribute for open data is being public, and then accessible, fully described, reusable, complete, timely and managed post-release. Therefore one might think that this policy is mostly about encouraging agencies to pursue what was started four years ago with the Open Government Directive and build on the success of the many initiatives that Todd Park has relentlessly pushed since when he became US CTO.
Even if this were the only focus of this policy, it would be a great accomplishment. The policy provides clarity on issues like the so-called “mosaic effect” (i.e. the risk that combining individual datasets may lead to identifying individuals).the need to prioritize data releases by engaging customers, the need to enforce privacy and confidentiality, and more. The policy also announces the establishment of a new resource called Project Open Data, which will be an online repository of tools, best practices and schema to help agencies.
But there is more, and this is where the policy gets really interesting. As the Scope section says,
The requirements in part III, sections 1 and 2 of this Memorandum apply to all new information collection, creation, and system development eff011s as well as major modernization projects that update or re-design existing information systems
Section 1 is about collecting or creating information in a way that supports downstream and dissemination activity, while section 2 is about building information systems to support interoperability and information accessibility. In the former, the policy asks agencies to “use machine readable and open formats for information as it is collected or created”. The latter suggests that “the system design must be scalable, flexible, and facilitate extraction of data in multiple formats and for a range of uses as internal and external needs change, including potential uses not accounted for in the original design”. Still in section 1 one can read “Agencies must apply open licenses, in consultation with the best practices found in Project Open Data, to information as it is collected or created so that if data are made public there are no restrictions on copying, publishing, distributing, transmitting, adapting, or otherwise using the information for non-commercial or for commercial purposes”.
The scope section also says that
The requirements in part III, section 3 apply to management of all datasets used in an agency’s information systems
Section 3 is about strengthening data management and release practices and says that “agency data assets are managed and maintained throughout their life cycle”, and “agencies must adopt effective data asset portfolio management approaches”. Agencies must develop an enterprise data inventory that accounts for datasets used in the agency’s information systems. “The inventory will indicate, as appropriate, if the agency has determined that the individual datasets may be made publicly available”
Now, let’s forget the first attribute of open data for a moment and let’s look at how this applies to any data, even non-public one. Most of what is said above still holds. The enterprise data inventory is for all data, machine-readable and open formats apply to all data, interoperability and information accessibility apply to all data. Some, maybe most data for some agencies will be public, but other will not, and yet the same fundamental principles that look at data as the most fundamental asset still apply.
A while ago I wrote about the concept of basic data that the Danish government had come up with, and more recently I have written a research note about the importance of data-centricity in government transformation (subscription required). This policy seems to go in the same direction.
While its packaging and external focus is mostly about open public data, and in this respects it further develops policies that we have seen a few years ago, its most disruptive implication is that the concept of “open by default” does apply to any data.
It would have been beneficial to make a clear distinction between “open data” and “open public data”, but I understand that the constituencies that push for transparency and openness would not welcome the distinction, assuming that this would give the government the ability to decide at leisure where to share and where to hide data.
Nonetheless, the policy can be read and used as a means to initiate a tidal shift in how data is used across government.Section 5 of the policy is about incorporating new interoperability and openness requirements into core agency processes. Information Resource Management (IRM) strategic plans must align to agency’s strategic plans and “provide a description of how IRM activities help accompanying agency missions”.
Finally the implementation section puts the CIO at the very center of this change, without calling – at least explicitly – for any new role (such as Chief Data Officer), and stresses that cost savings are expected and potential upfront investments should be considered in the context of their future benefits and be funded through the agency’s capital planning and budget processes”. Which is to say that openness is not a nice to have, for which additional financial support should be expected, but is at the core of how agencies should operate to be more effective and efficient.
As I am a cynical analyst, I can’t just be complimentary of an otherwise brilliant policy, without flagging one minor point where the policy might have been more explicit. In section 3.d.i the policy indicates the responsibility for “communicating the strategic value of open (public) data to internal stakeholders and the public”. This is great as selling open public data internally is absolutely fundamental to get support and make openness a sustainable practice. However I would have loved an explicit mention to the need for agencies to use and leverage each other’s open public data, rather than suggesting that the only target is “entrepreneurs and innovators in the private and nonprofit sector”.
Let’s be clear: there is nothing in the policy that would either prevent or discourage internal use of open public data. But as the policy gets implemented, the balance and collaboration between the CTO Todd Park – who will most likely continue pursuing the external impact of open public data – and the CIO Steve VanRoekel – who chairs the CIO Council and will be mostly concerned with the internal use of information – will be crucial to make sure that openness by default becomes the new mantra.