The interest on big data and open data is understandably growing all over the world. The combination of several technology innovations, in areas like social media, cloud computing, analytics, offer scenarios that we could hardly imagine in the past. And the trend toward greater transparency and openness that is being championed by many governments and NGOs is almost creating a “perfect storm” around the ability to extract wealth from the growing masses of data that are freely available over the Internet.
It is not just about data that was previously kept behind boundaries and that governments are liberating through their various “data.gov” initiatives. It is also about new data that is generated through idea contests, online dialogues, social games, photo and video sharing sites, webcasts and webcams, and the likes.
The theory says that the more we can make sense of these masses of data, the more we can meaningfully link data to each other, the more we can deploy powerful analysis tools, the greater value we will be able to extract value from data and this will create new opportunities for growth and wealth, and will allow to tackle and possibly solve intractable problems.
This is not going to happen overnight and there will be good and less good experiences, as Alex Howard points out in a recent post, but it is important to continue and nurture the open data movement. More data means more transparency, more freedom, less centralized control of information, and ultimately more democracy.
On the other hand, no human being can make sense of such a mass of data, so we do need tools, intermediaries, agents who make this digestible to us. We see that already today. We use Google to search for relevant information, but we are not really in control of what appears first on their pages or why. We follow our friends on Facebook, but we do not have any fine grain control on the feeds we see on the page. We follow Twitter trends, but we do not know whether and how they are edited.
Is there any reason why it should be any better with big, open data? Of course there will be more players, at least in this initial phase, but it is quite likely that we will see big data behemoths emerge after a consolidation period. Some of them might be familiar names, some might be new. What will we know about how they use open data?
There will also be plenty of other opportunities for more local, smaller scale uses of open data: but whatever value or conclusion from such data is likely to be extracted, it will be thanks to the engagement of organized groups, be they companies, government agencies, advocacy groups. What about their agendas?
Think about a beautiful application that mashes up data from multiple sources: what would its users really know about how data is chosen, what algorithms are applied, what patterns are implemented? Openness supporters will claim that open sourcing these applications would solve most of the problems, but for open source to have such an impact one needs a sufficiently vibrant and diverse community of individuals who share the interest to maintain and develop that application. With thousands of applications for multiple devices, how many will command the attention of enough people to even form a community?
Some people may say that worrying about this is like putting the car in front of the horse. On the other hand, open data initiatives have another, more immediate problem: their relevance. In fact, besides the theory above, only a fraction of the many open data initiatives are delivering significant results.
The good news is that the same approach that can make open data initiatives more impactful can also reduce the risk that they drift toward something that is less democratic and transparent.
The solution is focus.
Rather than just letting technologists and open data experts come up with ideas that can be mapped to problems, let’s put them to work on specific problems that are universally (or widely enough) considered as a top priority. It could be as challenging as fighting breast cancer, or as specific as developing a city budget in tough economic times, or as making Italian pay taxes, but there has to be a specific challenge that people have to work against.
Creating communities around problems can be a better start than pulling together existing communities hoping they’ll find some common ground. It allows transparent governance mechanisms to be built in from the outset, preventing partisan interests from derailing the common effort.
This is not an either-or. The yin and yang of innovation will require a balance of bottom-up, technology-pushed initiatives and bottom-up, problem-pulled initiatives. All I am saying is that the latter have been given less attention than the former so far, and this should change.
Striking the balance will be key for us to be able to leverage and exploit open data, rather than the other way around.
The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.