“Organizations have access to robust structured data in terms of transactional and operational systems, which they have historically been able to leverage for analytics and data-driven decision-making. However, opportunity lies in bringing in customer data from their digital footprints across multiple channels; rich albeit unstructured data that offers insights into interactions. While there are mechanisms to extract value from each form, however there is a significant effort in doing so from unstructured data. In its raw form, it offers no value,” said Sachin Arora, Partner & Head, Digital Lighthouse (Analytics, AI and Data), KPMG in India.
About 90% of data is unstructured, and most of it comes from a plethora of sources such as emails, text files, pictures, videos, news, etc. At the moment, it is challenging to strategize a model that would continually keep it structured. Even the most disciplined companies will have a lot of information captured as text or rough notes, etc. So it is imperative for an organisation to identify all sources of data and build a holistic strategy to harness these datum. It would not be surprising that some unstructured data is not even considered in the list of data which can be leveraged in the future (for eg: text of emails, meeting minutes and notes, customer calls, client calls, etc.).
So how do you manage and strategize unstructured data?
“The first step is to ensure a strong pipeline to capture data from origination systems like social, mobile applications, website logs and to store it in a location which is scalable and from which value can be extracted later. The best home for unstructured data is a cloud data lake or a NOSQL datastore depending on the nature of the use cases for which we will ultimately solve. NOSQL datastores are good mechanisms to capture data with changing schema,” Arora added.
A three point strategy to manage it:
1. Identification of the data sources:
To begin the process of structuring data is to identify the data sources and holistic view of what all data is available in the organization.
“Identify the data sources that are most relevant to the kind of insights that you want to derive. Typically, this is a combination of datasets and the metadata of those datasets. Correlating the two is equally critical. Making sure that you are gauging and measuring the quality of the selected datasets is critical,” said Sameer Dixit- Senior Vice President- Engineering- Data and Integration, Persistent Systems.
2. Setting a goal:
To begin the process of structuring data, one ought to set a goal, or an objective, i.e. detailed identification of what one’s trying to find out from the data pool. This means identifying the analysis that you want to do on that dataset to make sense of that dataset comes next. This could be through techniques like entity extraction, OCR, voice to text, entity identification, sentiment analysis, intent identification and so on.
3. Data Aggregation, Transformation and Integration:
This journey of maturation can be compressed by the use of Natural Language Processing (NLP) and Computer Vision (CV) techniques to derive most meaning and value from the data.
“And then finally after cleansing, transformation and tagging can be applied for processing the data using Artificial Intelligence and Machine Learning (AI / ML) and NLP techniques and tools,” said Uday Chaudhari, Sr. Director – Technology.
Talent required for this
Sameer Dixit- Senior Vice President- Engineering- Data and Integration, Persistent Systems believes that processing unstructured data requires a distinct set of skill sets than those required for processing structured datasets.
“These are typically niche skills and could range from a variety of technologies. Unlike SQL which is the most common technology used to process structured data, unstructured data requires deep programming skills. Some of the latest machine learning techniques have made it possible to draw a variety of insights from unstructured dataset of all formats. A lot also depends on the kind of unstructured data that you are looking to process,” he said.
How will this help the business?
The benefits of unstructured data processing can range across the enterprise across multiple business functions. It could be used to avoid compliance issues, understand your customer sentiment better, improve team efficiency and productivity by automating repetitive tasks, and so on.
“Structured data’s benefits are in plenty, starting from aiding businesses to make important decisions in competitive environments, to providing instant query results. They will also prove beneficial in implementing machine learning, predictive analytics, data discovery and profiling. The noblest of all is that it serves as the central version of the truth, which enables businesses in their processes as well as ESG initiatives,” said Uday Chaudhari, Sr. Director – Technology Innovations at Synechron Technologies.
One interesting example of how Zivame leverages structured data is to decide on initial merchandise inventory for its new stores. This inventory spread is determined based on the historical data of products purchased by online shoppers around the same location of this new store.
“Zivame leverages AI/ ML algorithms to personalise many aspects of the consumer’s shopping experience on Zivame. Using structured data, collaborative filtering approaches showcase relevant offers to customers, AI based recommendation algorithms help with finding relevant related products for consumers.” said Yash Dayal, CTO, Zivame.
“As a general rule of thumb, it’s much easier to work and derive insights with structured data when compared to unstructured data. The more structured data available, the better competitive advantage possibilities are there for the organisation. In addition, there are great strides in NLP, Speech to text capabilities which can help to quickly convert unstructured data into a structured format,” Dayal added.