Blog | Data Engineering

Ignore Data Quality Improvement at your Own Peril

share on

by 22 Oct 2015

Data is the oil that runs business. And if data is adulterated the growth engine of a company will cough and sputter, and ultimately grind to a halt. Most businesses have a data quality problem. Their databases have incomplete data, incorrect data, duplicate data, dated data or non-contextual data. These data issues cause real world problems that range from the annoying like a customer getting an inflated mobile bill to life threatening like a patient getting the wrong medication. There are also financial implications of dirty data. It’s estimated to cost the US economy $3 trillion per year, add $314 billion as overhead in health care costs and impact 10-30% of organizational revenue each year. So if this problem is so dire why does it persist?

The scourge of Bad Data

At an intuitive level no one doubts that bad data is a business liability. Everyone is familiar with the Garbage In Garbage Out dictum. But the devil is in the details. And there are a few reasons why companies struggle with data quality.

Lack of budget and demonstrable ROI

According to Saul Judah, Research Director at Gartner,”… organizations continue to struggle with securing resources to improve the quality of their critical data. Often this is because they are unable to effectively communicate what it is that is actually broken, to the people who should care and are able to help them. ” With budgets stretched thin data governance initiatives might be off to a rocky start because it’s difficult to calculate the opportunity cost of bad data in dollar terms. Furthermore, it becomes difficult to quantify the impact of bad data on a particular business process in advance. It’s only in hindsight, in the event of something like a lawsuit or a recall that the true cost of bad data might become apparent.

Data Engineering or something else.

Let's help you with your IT project.

Faulty data gathering at the source

In most business processes data comes from multiple sources. For example, an e-commerce website would use multiple data points: product details, ordering information, shipping addresses etc. If data quality is not maintained from the initial stages there would be pure chaos, with A’s order delivered to B’s address, for example. Unfortunately many enterprises face varying levels of chaos because of faulty data collection right from the source. A large percentage of this problem can be traced to legacy processes. Many companies move historical data from old to new systems without cleaning up or following proper ETL (Extract, Transform, Load) processes. Then, there are flawed data collection processes: think poorly designed forms which do not check the validity of a phone number or a postal code.

A band-aid culture

In most companies people who are directly affected by poor data quality are usually way down the organizational totem pole. They are customer service reps, sales reps and entry to mid level employees who have to spend a significant amount of time cleaning up data so that they can do their jobs. In the absence of any formal processes any improvements made to the database are restricted to only a particular data set or have to be carried out all over again. These shadow functions are a waste of valuable employee hours and can result in a demoralized workforce. Also, as business data is spread out across multiple databases inconsistencies might creep in if there is no data governance policy. Employees will have to waste time reconciling multiple versions of the same dataset, which is a productivity killer.

Setting up a data cleansing process

Before a company starts projects around Big Data or predictive analytics they need to take care of bad data. To do this, there has to be clarity on what appropriate data is. This calls for setting up of a data quality assurance program which helps in identifying the right kind of data according to metrics like integrity, completeness, consistency, validity, and timeliness. A data quality assurance initiative will ensure that bad data will not pollute the system in the future. But a company will still have legacy databases with low quality data. The first step is to identify bad data so that decisions makers can neglect it while taking business decisions. Bad data will have to be processed on a case by case basis. Some types of bad data are easily fixed using software or human resources. If there is a business need for it an organization should be ready for a financial obligation. However some types of bad data can’t be fixed. In that case the business owners should be aware of the systems which are responsible for generating bad data and focus on fixing these processes.

Conclusion

Companies struggle with data quality improvements because instituting such changes will affect virtually every employee in the organization. Bigger the organization greater is the attraction for business as usual. The bad data cycle can be tough to break and will take long but if a proper data management process is followed the organization will see quick as well as sustainable wins.

Ignore Data Quality Improvement at your Own Peril

The scourge of Bad Data

Setting up a data cleansing process

Conclusion

Recent Posts