This a pretty laborious process, though. According to a BBC Report, the International Consortium of Investigative Journalists (ICIJ), which has a member list of nearly 100 media partners in 67 countries, has been investigating more than 785,000 offshore companies that are implicated in the Panama Papers, the Offshore Leaks, the Bahamas Leaks, and the Paradise Papers investigations. It has taken the ICIJ five years to tease out the connections between 290,000 companies (roughly 1/3 of the total) and records in other databases.
At ThinkData Works, we have been working on a similar problem, and the result is a smart record linkage engine (RLE). Typically, when a lending institution such as a bank is trying to paint a full picture of a company in order to gauge their credibility, their methods resemble a journalist’s, which is to say they manually cross-reference different databases including import/export, procurement, public company data, etc, in order to piece together matches based on company names, registration addresses, stakeholder and CEO identity, etc. As you can imagine, this process can be time consuming. Consider the concessions you would have to make for different languages and formats, for starters, or the time inevitably spent accounting for typos, abbreviations, and other variations in the data.
"LEVEL 2,1 WESTFERRY CIRCUS", "PKF LITTLEJOHN 1 WESTFERRY CIRCUS", "℅ PKF LITTLEJOHN 2ND FLOOR, 1 WESTFERRY CIRCUS"probably refer to the same address.
Based on similarity scores in individual fields (in this case company name, address, town, county, country, and CEO), RLE’s AI component kicks in and makes the decision that these companies are closely related. Since the RLE has been built to scale gracefully by leveraging big data tools, the whole process to find companies related to Cambridge Analytica takes less than a second. This is very good news for anyone interested in connecting entities across numerous, often huge, databases.
None of this is intended to replace a journalist’s intuition, of course. The RLE is built to work with structured databases, which will give human experts time to dig into the individuals and companies behind the connections, or to digest unstructured data such as blogs, twitter, and insider tips. Compared to Siegelman’s graph the above list is incomplete, but it effectively automates the repetitive, time-consuming, and (one can only imagine) aggravating needle-in-a-haystack process, which gives experts looking into the data the time they need to focus on more valuable inputs.
Spreading money around isn’t a new phenomenon. There are many reasons (not all of them nefarious) why a company or individual might choose to distribute their assets across many holdings. There’s a problem, however, when the shell game played by corporations is designed to obfuscate in order to avoid detection. Whether it’s financial institutions trying to improve their anti money laundering efforts or reporters uncovering the relationships that drive investigative journalism, staying one step ahead of the curve is necessary. Data-driven AI tech can help make the connections that cut through the obscurity and paint a full picture of who’s really involved in a corporation, and why they might not want to be found.