By Raj Srinivas

Problems and solutions for businesses executing a big data predictive analytics projectPredictive Analytics is actively becoming a big part of the Big Data Analytics Market. Enterprises small, medium and large are using Predictive Analytics as a tool to predict varied business outcomes much in advance. Gartner in their 2020 report have said that by 2023, more than 33% of large organizations will have analysts practicing decision intelligence, including decision modeling.

So, what is Predictive Analytics? It is a Big Data process by which outcomes can be predicted in advance for a business problem for which there is past and present data available (on a set of attributes that normally affect the problem). A combination of statistical hypothesis and methods, AI/ML algorithms, Data Mining, and Business processes are used to arrive at the prediction. It offers businesses rare insights into what possible trends and outcomes they can be looking at in the foreseeable future. The problems could be as varied as sales forecasts, inventory requirements, materials management, staffing requirements, project execution analysis, project win analysis, budget requirements, salary structures etc.

What are the factors that a business must be aware of, before embarking on a prediction analysis for a problem? What could possibly go wrong with these processes. What are the solutions for some of these missteps?

1. Goal of Prediction Analysis: First there should be a clear prediction definition statement. It should clearly say what business trend/pattern/outcome they are trying to predict, for what period of time, and for how long the prediction should be valid. This will help to form the statistical hypothesis, a kind of conversion of the English-like problem statement into a statistical problem. Unclear prediction definition problem statement often leads to predictions to wrong problems.

Solution: Brainstorm within the management team to come up with a statement that has clarity in the prerequisites, and assumptions and deviations that are willing to be accepted as part of the prediction analysis. Communicate the same to the data analytics team precisely, which can dissect the statement and convert into a statistical and data analytics problem.

2. Quality of Data: The quality of data we have should be accurate and reliable. Obviously, the outcome will solely depend on the data we put into the prediction. If the data is skewed, then the prediction which is dependent on it, will be skewed as well.

Solution: The first duty of a data analytics engineer is to confirm the randomization of the data given. If the given data is non-random then the correlation and relationships between the data attributes will be skewed leading to wrong outcomes. The next point is to see the standard deviation of the data after normalization. Decision about the outliers (as to whether to keep them or discard them) should be taken so that the data that is fed into the model is consistent. Also, in case of unstructured data, proper cleansing needs to be done. Appropriate Data Mining and Data Lake Processes should be followed.

3. Selection of data attributes/identification of patterns: Once the data problem is solved, now comes the activity of identifying patterns and correlations between the data attributes. These relationships should be looked at critically in relation to the problem statement in hand. Non-correlated data attributes and insignificant attributes will definitely not lead to correct predictions.

Solution: Data analysts should zero in on the attributes that are positively, negatively or cyclically correlated with each other and in some cases with respect to time. Data that is insignificant should also be identified and discarded. This will ensure that the right data attributes will form the backbone for the model and its subsequent training and testing process using AI.

4. Arriving at right AI/ML Model, AI/ML algorithm training methods: Depending on the nature of the problem statement and the data attributes identified, the model should be carefully arrived at. If adequate thought is not given to the nature of the problem (for example whether it is a classification prediction or a regression prediction etc.) the model selected, and subsequent AI training of the model will be in vain. The constraints, limitations, assumptions of the prediction expected will also play a role in model selection.

Solution: Data Scientists and Data Analysts should brainstorm each and every aspect of the problem and critically question the inclusion/exclusion of attributes, consideration of the right model types (Bayesian, Decision trees, Graph, Support Vector Machine, Artificial Neural Networks etc.) before arriving at the right model. Model Training is an important process for which training data needs to be carefully chosen from the available datasets in order to train the model selected.

5. Communicate prediction to business team and get actual results recorded from the business team: Once the predictions are visible from the model, the data analytics team should communicate all the nuances of the prediction to the business teams. This way the business teams and stake holders can start to look out for the actual data in the field. This is more like the testing phase for the model, where the actual data thus collected could be used to test the prediction itself for a period of time. If this process is not done, one will never know if the prediction was correctly done.

Solution: The choice of the model and data attributes selected, their correlations can all be reassessed with these inputs from the business/sales teams.

6. Minor adjustments to the prediction algorithm and continuing the training and testing process through AI: The input from the testing process should be used to fine tune the model and algorithm. It should help to strengthen the prediction process by weeding out any wrong parameters or include new parameters & data points that might have been missed. If this is not done, especially in complex, time dependent/sensitive models the prediction can easily go wrong.

Solution: Data correlations could change over even small increments of time (say within 3 to 6 months). So, it is highly important that the prediction pattern should be adjusted with current data inputs as they become available. This reduces the mean square error of the model selected and predictions remain updated with latest data and over time.

7. Long term analysis of predictions: The teams should keep a constant watch on data, patterns, model, training methods, and change in data through current and future time periods and should be willing to report to the management team, any prediction changes as they occur. In certain scenarios predictions can totally turn upside down within a period of 6 months.

Solution: The chances of anything going wrong or shift in predictions can be identified immediately through this process and necessary input changes to the model can be done, to look at a new set of predictions over time, if necessary.

Gartner predicts that by 2023, 30% of organizations worldwide will be using technologies like graph to facilitate rapid conceptualization for decision making. Thus, we see that businesses are going to adopt prediction analytics immensely in the near future to achieve better ROI.

The author is CTO, SecureKloud Technologies Limited.

Source link