We analyse the potentials and pitfalls of expanding your data set with outside sources
Many companies today are aiming to build their capabilities in Data Management and Data Analytics.
This can be seen for example in the listings of the most sought-after recruits in the IT business, where Data Analysts, Data Scientists and Data Engineers rank high in the list year after year.
But all these capabilities rely on having correct, quality data to function.
Although the amount of data is rarely a problem, there are good reasons why companies can benefit from adding third-party data sources to their own data repositories.
As with the overall amount of data, the third-party marketplace is booming at a rapid pace. Along with commercial data providers, there are more and more free databases from governmental and educational organisations. These free databases are especially useful in analysing for example demographical information in different countries. Finland has long been a frontrunner in publishing geographical data for public use.
What are the benefits of using third-party databases?
The data collected by a company itself is always limited to the scope of influence of said company, namely its customers and suppliers, in addition to the operative and financial data that the company generates. Certainly, this first-party data is the most important data source company has, but its limited scope restricts the visibility to the factors and stakeholders that the company is in direct contact with. Second-party data (which refers to company data received from other companies) can broaden the view, but is still often limited to customers and suppliers.
Thus, to be able to analyse factors outside of this scope of influence, companies need to look for data in external data sources.
Many times, the analytics efforts fail due to the lack of data or its poor quality. One benefit of external data sources is that the data service providers often take time to verify that their databases contain correct information and that the data offered is up to date. At the very least, data service providers need to catalogue their data – something which is not a given in many companies. Thus reduces the risks related to data quality and data availability.
Compared to more traditional databases, third-party databases are also often designed to be user-friendly and easy to access, making them easier to use and integrate to the existing infrastructure even without dedicated Data Engineers.
How to use third-party data: A research example
In a research article I had published in the August 2018 issue of the International Journal of Logistics Research and Applications, I used the Orbis company database to analyse how a business cycle affects the use of trade credit (lending from suppliers and customers) in different tiers of the value chain provided by Bureau van Dijk.
The Orbis database is a reputable third-party data provider that lists company data of over 300 million private companies across the globe. In addition to just listing company data, Orbis also makes sure that the data listed is up to date and comparable between countries. They also provide a visual user interface for visualising, filtering and downloading data.
Using Orbis, I downloaded financial data from over 50 000 SMEs in Europe between 2006–2015 to calculate the differences between days-payable-outstanding and days-sales-outstanding ratios, which were used to indicate the use of trade credit. The results that I found were that these companies are more likely to borrow from their suppliers and customers during economic downturns than during upswings. Especially this was the case with manufacturing companies.
This type of analysis would not have been possible without a pre-collected database such as Orbis.
Tangible benefits and even quick wins!
Using third-party databases can bring tangible benefits to your company’s data analytics efforts. Especially when analysing factors outside your business, such as market analysis or end-user analysis.
Third-party data sources can also bring you quick wins, or get you going, if for example your analytics efforts are bogged down by with issues in data quality. One thing that could also be considered is the cost difference in buying data or cleaning it yourself.
About the author:
Perttu Hautala is a Junior Advisor at Sofigate working in the area of Data Leadership.
A M.Sc. in Economics and an academic author, Perttu is interested in Data Sciences and driving business development through the use of data.