Organizations can’t afford to ignore the potential of external data. This is a resource that goes much further than the open data released on government portals; this is data from anywhere – from Twitter to the World Bank – and it can be used for anything. At ThinkData, we work with businesses that are using external data in ways that couldn’t have been imagined ten years ago, whether they’re analyzing government bids to adjust the estimate of a company valuation, tapping into historical weather data to anticipate customer behaviour, or processing satellite images of parking lots of popular chains to predict the direction of its stock.
Let’s consider the following scenario: an organization wants to use Namara to integrate a few premium data sets with its internal CRM. They are particularly interested in:
Let’s break it down by each data set.
For this example, the internal data is a pretty typical CRM with its internal identifier, company name, company category, location, and, of course, contact information (see the breakdown of the attributes by category below each data set).
Canadian Company Capabilities is an Industry Canada database of local businesses that is aimed at opening exporting opportunities, facilitating a search for prospective partners, and analyzing the competition. Because of this, it has a lot of information with respect to classification of company activities/services provided/goods manufactured, its financial situation, and their representatives’ contact information (up to 60 for one of them!).
This Corporate directory contains the information that Canadian corporations are required to file, including standard identifiers such as business and corporation numbers, corporation names, its type, governing legislation, whether this corporation is active or not (and why), location information, lists of activities and directors.
The TMX feed consists of the information pertaining to a traded security, including its symbol, CUSIP, corresponding company names, nature of business, number of outstanding shares, dividend factor, etc.
For the current scenario we’re using Thomson Reuters PermID. PermID is a unique identifier assigned to a variety of different business entities (ie organizations, persons, instruments, quotes) in Thomson Reuters’ internal universe of linked data.
Open PermID comes with the following set of APIs for entity querying and retrieval:
You can see a sample response below:
The next step is running a Namara data set through the Thomson Reuters API to get unique PermIDs. (This is about to get a little technical, so feel free to hop down to The Result section to see how it all pans out.)
To access an individual data set, use an endpoint that looks like this:
You can find the current version of the data set, along with its ID, on the data set’s API info tab:
https://app.namara.io/#/organizations/<your org id>/dashboard
Finally, you can grab your API key by navigating to account settings in the user menu and selecting API key:
By running all the data sets through the Thomson Reuters API, we can assign permanent identifiers to every company that scored above the cut-off point (for this example we set a match score of 85%). Once each data point has a unique key, we’ll be able to accurately join values, meaning that we can seamlessly integrate the data with the internal CRM. From there we can continue to enrich the data sets, deriving more attributes while guaranteeing overall data quality.
As companies continue to capitalize on the new and increasing availability of external data, it will be critical for all organizations to develop a standard, scalable solution to entity resolution problems. Whether you’re a vast enterprise with a tried and tested MDM or a small business plugging new leads into your CRM, the injection of external data into your internal environment is sorely needed; the headache of managing it is not.
Namara acts not only as the central clearinghouse and refinery for this new data, but it integrates seamlessly with whichever entity resolution product you use, meaning you get to spend more time building your business. It’s a new way of getting more out of external data, no matter where the data is coming from.