Today, data and the insights it offers are the lifeblood of any modern organization. Companies have a wealth of insights trapped inside the massive amounts of data residing across their business which needs to be mined securely and trusted to leverage across various functions.
“Data-based decisions are being driven across industries,” Prasant Kaddi, Partner, Deloitte India said. “This necessitates data to be reliable to avoid the GIGO (Garbage in Garbage out) phenomenon. It costs enterprises in revenue and profitability, various studies have estimated value erosion of 10-30% due to poor data quality. The classic example is being unable to reach a customer for verifying a transaction because of bad contact data. Organizations need to make Data Quality and governance programs an executive concern, covering business, not just IT. Measuring data quality and correcting data while also putting in place robust data governance processes to ensure future data quality is critical.”
A step-by-step approach is thus required to best clean the data.
Organizations need to ensure they have good business terms in place for the top Critical Data Elements (CDEs). Make sure they have good descriptive names, short and long descriptions, and abbreviations. Apart from indicating what is contained in a data element, this will help in understanding where the critical and sensitive data (such as PII) reside and therefore it can be managed appropriately. Also, once the Business Term is assigned, the application of the right business rules become possible
Create Automation Rules based on Business term assignments – Can automatically bind/create Quality Rules. This helps in creating a consistent view of the data and dealing with the volume.
Examine data for duplicates, irrelevant data, structural issues, outliers, and missing data. This will help cleanse the data and lays the foundation to turn data into trusted information.
Examine the data quality dimensions, enable/disable as needed, create and install custom dimensions. Helps to maintain an agile, business-focused environment to customize data, create standardization processes for the business needs, such as data enrichment or data cleansing.
Use the Data Quality Score to validate and iterate. It enables organizations to analyze and monitor data quality continuously to reduce the proliferation of incorrect or inconsistent data.
“The actual cost of data quality can vary depending on the source of information,” Subram Natarajan, CTO, and Director, Technical Sales, IBM Technology Sales, India/South Asia said. “It suffices to say that the cost of erroneous decisions resulting from bad quality data can lead to critical impacts and also extremely costly running into several millions of dollars. Furthermore, as per our latest Global AI Adoption Index 2021 report, Indian IT professionals are most likely to see limited expertise or knowledge (52%) and increasing data complexity and data silos (50%) as barriers to AI adoption. Hence, cleansing data is also important for organizations to leverage AI and have the freedom and choice to apply AI to their data, wherever it is stored.”
According to Rajesh Ramachandra, Group Chief Digital Officer, ABB, the majority of data being churned out in the Manufacturing industry is unused and untouched. While there could be many reasons for this, its quality and clarity is one.
“The reason is that a lot of data sits in silos and is not integrated. Manufacturing companies get their data from multiple sources which may or may not be direct.We can look at the data quality and integrity in ways. Either we see what’s the gap and how do we actually fill it. So we go back to the OEMS in case of manufacturing, and ask them for the data that is missing. Or we look for the data which is in silos and integrate it with the systems,” Ramachandran said.
Industry experts further suggest that as a part of the data platform strategy, organizations should leverage technology to establish a programmatic way to enhance data quality. Unambiguous ownership of the data quality mandate is essential, such as a Data Quality Office, which will be responsible for measuring, monitoring, and addressing data quality issues and determining data’s suitability for downstream applications including ML and AI.