Through the Panama Papers released at the beginning of last year, the offshore tax haven activity of the national elites were seen in the full light of day by citizens.
The reverse engineering of the internal database of one of the world’s leading offshore money specialists, Mossack Fonseca, stands as one of the strongest ever examples of data journalism to date.
By using information leaked to The International Consortium of Investigative Journalists (ICIJ), a network of independent reporting teams around the world, at 2.6 Terabytes and 11.5 million documents, the Panama Papers is a dataset that dwarfs anything Wikileaks came up with in size and impact, observers agree.
Lessons for Business
What’s the significance, however, for the business community about what happened here? Is this something only reporters should be excited about?
Far from it. What’s ultimately interesting about the Panama Papers apart from the social impact of the investigation, is how it shows a radically new way of working with complex financial datasets in the shape of technology that can process a large volume of highly connected data quickly, easily and efficiently.
The technology used, it’s important to note, had to be used by non-experts too — in this case, non-technical journalists in 80 countries. And finally, all this data is now available to the public, allowing them to explore what their elites have been up to in their own, local ways.
The Amazing Graph Database
The technology at work here is a graph database, a way of working with information that means non-specialists are able to discern patterns and spot trends that weren’t visible before. Graph database technology is no less than, in the words of the ICIJ, “A revolutionary discovery tool that’s transformed our investigative process.”
That’s because graph technology outperforms other ways of working with data at elaborating relationships. That matters to investigators of whatever stripe, be they reporters or business analysts, as relationships are all-important in telling you where connections lies, who works with whom, and so on.
Graphs better reflect our grasp of the world.
How is all this possible? What is it graphs are doing here? It’s the way they use data, a way that’s closer to the way humans think about problems and find significance in things.
Rather than breaking up data artificially the way a relational database does, graphs use a notational structure that echoes the intuitive way we think about and work with information.
Once that data model is coded in a scalable architecture, a graph database is peerless at analyzing connections in huge and complex datasets. That allows any business user to spot trends and uncover commercial secrets in ways they have never been able to before.
Graph Databases = Deep Insights
While it’s true that investigating simple relationships in data could be handled by a relational database, they’re not an especially satisfactory fit, as they represent data as tables, not networks, and such queries strain a data structure not designed to map connections. Plus making them work in synchronous time is not easy, with performance faltering as the total dataset size increases.
Clearly, it’s not just investigative reporters who can benefit from being able to work with complex data, but any organization trying to address large-scale connected data.
For example, all the social web giants — Google, Facebook and LinkedIn — have been using graph databases to derive value from large-scale connected data sets for years. Google’s famed PageRank algorithm, for example, is a graph application, as are Facebook’s and LinkedIn’s proprietary tools for mapping social networks.
These pioneers started off by building graph-databases in-house. But what has put them in the hands of teams like the Panama Papers, but also any IT team, is the way graph database technology has matured and gone mainstream, both in commercial and open source forms.
Gartner reports 70 percent of leading companies will pilot a graph database project of some significant kind by 2018, and we can expect new graph names and graph products from database leaders set to emerge in the near future, as they recognize the enormous market potential here.
In any context where large, complex datasets need to be mined, graphs are increasingly a serious choice.
In retail, hospitality and financial services, firms are using graphs to offer sophisticated personalization and product and service recommendations, for example.
In financial services, fraud detection is an important graph application area, while in the media graphs are being used to map complex data structures.
The list continues, with plentiful examples coming through of how graph technology is making strong contributions.
Mining Connected Data
Graph databases aren’t helpful for every business problem; there are transactional and analytical processing needs for which relational will be the better option (think systems of record like financial, HR or ERP). What’s more, there are NoSQL (Non SQL) alternatives that handle other non-traditional data problems well, especially in the Big Data context (e.g. Hadoop).
But a graph database makes sense for any organization seeking to mine its connected data. That’s because in our super-connected, data-driven world, it’s not only reporters that need to follow the data trail.
We all need to.
Emil Eifrem is CEO and co-founder of Neo Technology. Previously CTO of Sweden’s Windh AB, where he headed up the development of highly complex information architectures for Enterprise Content Management Systems, Emil famously sketched out what today is known as the property graph model on a flight to Mumbai in 2000.