ON FEBRUARY 2, 2018

Entity resolution with Namara
and Thomson Reuters PermID

Leading organizations are starting to recognize the value of previously overlooked external data to derive new insights that can be directly fed into their predictive models, helping them gain a competitive edge in the marketplace. But as this new data flows into internal models, businesses are faced with a dilemma:
What is the most scalable and effective way to link internal and external data?

The Problem

Organizations can’t afford to ignore the potential of external data. This is a resource that goes much further than the open data released on government portals; this is data from anywhere – from Twitter to the World Bank – and it can be used for anything. At ThinkData, we work with businesses that are using external data in ways that couldn’t have been imagined ten years ago, whether they’re analyzing government bids to adjust the estimate of a company valuation, tapping into historical weather data to anticipate customer behaviour, or processing satellite images of parking lots of popular chains to predict the direction of its stock.

a partially occupied parking lot

is it half full or half empty?
These predictive models rely on a lot of data (and outside the box thinking) to function, and although they are all elegant in theory, they are not easy to execute. We’ve written elsewhere about the difficulty of connecting to public data, but in the past few years we’ve tackled another problem:
How can any company ingest new feeds of data generated outside their environment while maintaining the integrity of the reference data set created inside their environment?
What happens if some of these sources overlap and have duplicate values, or somewhat duplicate? Worse, what happens when these duplicates have different values for the same attribute? How should you consolidate these records?
actors who have played James Bond

Bonds? James Bonds?
Mapping such data points is known as an entity resolution or an entity mapping problem, and it is not a trivial one. Usually organizations resort to internal master data management (MDM) strategies to address it. However these are very context-specific, and can be laborious to put in place. There is also a trade-off between exact matching, which might dismiss a good number of records, and fuzzy matching, which might result in a high number of false positives.
On the bright side, there are a number of third-party solutions for different domains. Our partnership with Thomson Reuters gives us access to their PermID system, so today we’re going to look how to run data from Namara through the PermID APIs to resolve and enrich business entities.

Integrating Data into a CRM

Let’s consider the following scenario: an organization wants to use Namara to integrate a few premium data sets with its internal CRM. They are particularly interested in:

Let’s break it down by each data set.

1. Internal Data

For this example, the internal data is a pretty typical CRM with its internal identifier, company name, company category, location, and, of course, contact information (see the breakdown of the attributes by category below each data set).

2. Canadian Company Capabilities

Canadian Company Capabilities is an Industry Canada database of local businesses that is aimed at opening exporting opportunities, facilitating a search for prospective partners, and analyzing the competition. Because of this, it has a lot of information with respect to classification of company activities/services provided/goods manufactured, its financial situation, and their representatives’ contact information (up to 60 for one of them!).

3. Corporate Directory

This Corporate directory contains the information that Canadian corporations are required to file, including standard identifiers such as business and corporation numbers, corporation names, its type, governing legislation, whether this corporation is active or not (and why), location information, lists of activities and directors

4. TMX Feed

The TMX feed consists of the information pertaining to a traded security, including its symbol, CUSIP, corresponding company names, nature of business, number of outstanding shares, dividend factor, etc.


To summarize:
The data sets have almost nothing in common.
Even though all of these data sets have unique identifiers, they are internal and cannot be used as a key for a join. One possible way to address this could have been to resort to a company name. However, after a quick examination, we can clearly see that records like “Shopify Commerce Inc.”, “SHOPIFY INC” and “847871746RC0001 Inc.” would not be easily disambiguated.

Linking the Data

For the current scenario we’re using Thomson Reuters PermID. PermID is a unique identifier assigned to a variety of different business entities (ie organizations, persons, instrumentsquotes) in Thomson Reuters’ internal universe of linked data.
Open PermID comes with the following set of APIs for entity querying and retrieval:

  • Record Matching
  • Entity Search
  • Tagging
For the current example, we’re only focusing on the first. Record matching API allows to run a set of business entities against TR database, and returns a ranked list of best possible matches. In order to match an organization you need to specify its name and some of the following optional arguments:

  • standard identifiers (ticker or RIC)
  • street
  • city
  • postal code
  • state
  • website

You can see a sample response below:

Accessing Data on Namara

The next step is running a Namara data set through the Thomson Reuters API to get unique PermIDs. (This is about to get a little technical, so feel free to hop down to The Result section to see how it all pans out.)
To access an individual data set, use an endpoint that looks like this:

https://api.namara.io/v0/data_sets/<data_set_id>/data/<data_set_version>?organization_id=<your_ogranization_id>&api_key=<your_api_key>

You can find the current version of the data set, along with its ID, on the data set’s API info tab:

And you can find your organization ID by navigating to organization on Namara and selecting the page url. It will look something like this:

https://app.namara.io/#/organizations/<your org id>/dashboard

Finally, you can grab your API key by navigating to account settings in the user menu and selecting API key:

If you’d prefer a filtered data set, you can apply column selects and where params to the data set itself and then grab the filtered result from the API tab, under API filtered query (to learn more about how to use the Namara API check out our API Docs or reach out).

The Result

By running all the data sets through the Thomson Reuters API, we can assign permanent identifiers to every company that scored above the cut-off point (for this example we set a match score of 85%). Once each data point has a unique key, we’ll be able to accurately join values, meaning that we can seamlessly integrate the data with the internal CRM. From there we can continue to enrich the data sets, deriving more attributes while guaranteeing overall data quality.

This is how starting from a basic business directory with 20 attributes led us to developing a holistic representation for both private and public businesses with almost 150 different attributes.

Master Data Management with Namara

As companies continue to capitalize on the new and increasing availability of external data, it will be critical for all organizations to develop a standard, scalable solution to entity resolution problems. Whether you’re a vast enterprise with a tried and tested MDM or a small business plugging new leads into your CRM, the injection of external data into your internal environment is sorely needed; the headache of managing it is not.
Namara acts not only as the central clearinghouse and refinery for this new data, but it integrates seamlessly with whichever entity resolution product you use, meaning you get to spend more time building your business. It’s a new way of getting more out of external data, no matter where the data is coming from.


Thomson Reuters is one of our data partners. Follow us on Twitter to stay up-to-date with amazing work they are doing with us. Don’t hesitate to contact us, if you have any questions. And if you liked this post, make sure to check out our case study on using Unity for joining and enriching geospatial data sets.

5 Takeaways from TechFest

A couple of weeks ago, I had the pleasure of strolling around TechFest at the…

How to analyze Namara data using Tableau

Namara uses Tableau to dive into the Government of Canada’s spend data, driving efficiency and opening…