Open data is the 21st century’s most useful and least used resource. It can break down information gaps across industries and help governments benchmark themselves against each other; it can help farmers predict crop yield and weather patterns, help citizens fight corruption, propel innovation, grow industry, and help an eighth grader with their homework. Virtually every new business can leverage open data to gain a competitive advantage in the marketplace. It is a natural resource born out of, and fundamentally important to, the digital age. Ultimately, examples of its usefulness can’t possibly provide an adequate picture of its potential. Like the internet, the limitless uses of open data will only be fully understood when everyone has the opportunity to get their hands on it.
A Digital Divide
There is a canyon between those who have data and those who want it. While those who have data continue to push it into the gap, only the most intrepid consumer will risk the climb down to retrieve it. Using publicly available data has become a skill that belongs to a minority, which is directly contrary to its purpose, and to the principles of transparency on which the open data movement is based.
It is not the role of either data providers or consumers to build a bridge across the divide. Providers are too easily bogged down by the technical details involved in doing so, and consumers — since they can’t be sure that the product is worth the effort — aren’t eager to commit the time or money necessary to get it. The effect of which is that these two groups stay on their respective sides of the canyon and theorize about the advantages that one’s access to the other could provide.
That this has happened to the most connected and well-informed generation in history is remarkable, but not entirely surprising. 2.5 exabytes of data are created daily — a truly huge amount — and actively curating and dispersing a resource that’s being produced at such a rate did not occur to anyone until recently. By the time its value became apparent, the canyon between production and use had already been carved into the digital landscape. Because of this, the difficulties in releasing and ingesting data effectively have become entrenched, which in turn undermines the success of the open data movement as a whole.
It’s predicted that open data will disrupt virtually every industry in the next decade — from business and government to food production and medical research — but these predictions are routinely based on the hypothesis that open data is, or will be, fundamentally easy to access. The reality is that this isn’t currently the case, and it won’t be until a bridge is built between those who have data and those who can use it. Put simply, open data cannot live up to its potential until available data becomes accessible information.
The Myth of Usability
A lot of ink has been spilled on the value, both moral and financial, of open data, but any confidence in this value has at its heart the somewhat flawed idea that the data being released is ready to be consumed. The truth is that public data, like any natural resource, is only useful insofar as it can be refined into something useable.
For example, federal government procurement data is, in theory, incredibly valuable. Every day companies win contracts with the government — the largest, most established buyer in the market — and this data could be used by every financial institution to make more efficient decisions around activities such as approving small business loans, fine-tuning risk ratings, and influencing their position on a public company. But it isn’t being used because the data, as it’s released, isn’t consistent with the way in which financial institutions need to use it. In order to inject procurement data into their existing models, banks require a real time feed of the data provided in standard formats; they need to normalize the data, and they need to map companies to parent organizations in order to link them to a database or stock market. Without this level of curation, procurement data is about as useful to financial institutions as a barrel of crude oil would be to someone whose car was running out of gas.
Government, however, cannot be tasked with layering all the above prerequisites onto procurement data before it is released, and neither should financial institutions have to develop the infrastructure to do so themselves; in both cases, the time, energy, and financial commitment required are prohibitive.
The work that’s necessary to make open data compatible with existing products and processes should not be considered the job of the provider or the consumer — It should be seen as an opportunity for anyone who is trying to build the bridge between these two parties and unlock new value in previously unavailable information.
The role of government
Although governments are increasingly eager to be transparent, they’re rarely coached on how to best open up their records, and while they may be told that their internal data is valuable, they cannot know what data sets represent the highest value or how exactly that value will be made manifest. The upshot is that governments are making individual decisions about what’s being released, how much of it is released, and how it should look, which makes standardization across governments nearly impossible. If, for example, San Francisco releases the cost of building a bridge, that data set is only really valuable if it can be benchmarked against similar projects in cities like New York and Des Moines — if it cannot, the data loses the majority of its value.
The problem lies in the particular nature of government bodies and the vast number of ways in which open data can be used. Due to the volume of open data case studies that have been published in the past five years alone, municipal, county, and state governments generally find themselves in the position of having to make a series of choices as to what kind of data they want to publish. Because of a demand for financial transparency, a government might decide to focus entirely on releasing its budget and spending records. Alternatively, a rash of recent criminal activity might lead a government to release police data. Cities with aging infrastructure might tend towards releasing 311 data while cities that rely heavily on tourism will release information pertaining to their restaurants and historic sites. The above examples are all excellent use cases for open data, but all of them aim at a single end goal. These end goals aim at the immediate uses of open data, but fundamentally miss the purpose of open data in general. The purpose of open data is to create a new ecosystem of useful, readily available information.
Instead of thinking about the individual opportunities that public data provides, it is better to picture open data as a global puzzle, and that every municipality in every county in every country has a piece to contribute.
The fact of the matter is that the size, structure, and internal idiosyncrasies of government bodies make them poor candidates for the kind of overarching standardization and aggregation that’s necessary in order for open data to enter the mainstream. It’s a predictable problem. Governments act independently from one another because, as a rule, it is important for them to customize their initiatives in order to better serve the citizens to whom they are obligated. But in the case of open data, this customization has reduced the efficacy of the entire movement; it isolates one government from another, which means that only a fraction of the problems that open data can address are being solved.
While government will undoubtedly play a major role in the development and adoption of open data globally, it is too much to ask that they act not only as data provider, but curator, catalyst, and policymaker as well. Since governments possess the greatest proportion of valuable data, their role — the one that requires the least effort and will have the greatest impact — should be to release as much of it, in as good condition, as possible.
This may seem contrary to the narrative that’s starting to develop around open data. Increasingly, governments at every level are adopting open data policies and launching open data portals. This, it should be said, is critically important. The problem with the practice is that too little is being released because too much is expected. Ultimately, governments cannot possibly know in advance how their data is going to be used, so the attention that’s paid to the details — visualizations, splitting large data sets into manageable pieces — actually undermines the usability of the raw data they possess. The irony is that by trying to make data consumable, governments are limiting its use.
Bridging the Divide
The issue, then, is not that there isn’t an appetite for open data, or that governments aren’t willing to release it. The problem is that there is a chasm between what one wants and what the other is providing. The good news is that this problem doesn’t indicate an inherent flaw in the open data movement, but exposes an opportunity for those who want to enable people to use data effectively.
In 2013, McKinsey valued open data at $3–5 trillion per year. Since then, numerous publications and pundits have repeated the claim, but it remains a valuation that, due to its size, is difficult to grasp. Perhaps a more telling (or more approachable) prediction might be the one made by Gartner, a US-based technology research firm, two years later, when they estimated that by 2017 80% of organizations will be consuming open data. This prediction indicates that open data will not only be responsible for a seismic shift in the way businesses operate, but that open data will become a necessary commodity in order to stay relevant.
The momentum behind the open data movement is only growing stronger, but there are still fundamental issues facing its acceptance into the mainstream. Government’s role in the movement is nebulous, which complicates the release of open data; business is wary of the problems inherent in accessing open data, and citizens are unsure of its usefulness. But the fact remains that the complete picture of open data — rather than the glimpses seen through specific use cases and theoretical advantages — has the potential to revolutionize, shape, and improve the way the world works.