Three ways to generate business value from any analytics project
How to get meaningful results even when a problem seems intractable
Data scientists regularly come up against problems that appear unsolvable even when they have access to high performance compute power. Intel’s chief data scientist Bob Rogers is more than familiar with the issue of generating business value from large datasets, having spent a decade forecasting futures as a hedge fund manager in his previous role.
“We had tick-by-tick data going back decades, but there was a huge random component to this data that made automated prediction beyond a certain accuracy impossible. All the motives people have for buying and selling at a particular moment, combined with the sheer number of people trading, meant that no matter what we did, we’d never perfectly pluck signals from the noise,” he explained.
In the world of data science this type of scenario is known as an intractable problem. Once you reach a certain point, no amount of analytics will yield any progress. According to Rogers, many problems may seem impossible to solve at first but in reality they can be addressed by tweaking your approach to them:
1) Ask a more focused question
Having a broad question such as “What is the likelihood social media users will be interested in the car I’m designing?” is an example of a broad question that is going to generate too many variables and won’t give you any meaningful insight.
The best way to start is by splitting a question into sub parts and extrapolate lessons. So, you may be able to predict an increase or decrease in sales to a specific demographic, for example. From there, you could determine whether a change such as a boxier design would boost sales to stay-at-home mums more than they would hurt sales to single millennials. Not only does this make the problems more manageable, it has the scope to deliver real business value.
Remember, the same approach can help you isolate variables that are throwing off your algorithm. Instead of trying to predict hospital readmission rates for all patients, for example, you might divide a patient set into two groups—perhaps one of patients with multiple significant conditions, and the other of patients with only a single condition, such as heart failure. If the quality of prediction in one group shows a meaningful improvement or decline, that would indicate that your algorithm works for a data set that is not just smaller, but specifically clear of a particular confounding variable present in the larger pool.
2) Improve your algorithm
In data science, algorithms not only define the sequence of operations that your analytics system will perform against the data set, they also reflect how you think about, or “model”, potential relationships within the data. Sometimes creating the right algorithm, or modifying an available algorithm for your specific new purpose, requires many iterations.
One key indicator that shows your algorithm isn’t working is if you’ve scaled your compute power up, say by a factor of five, but are seeing a much smaller improvement in processing time. Another test is to slightly tweak your algorithm parameters. Any minor modifications to an algorithm should produce only slightly different answers. If they are producing drastically different results, there is a high probability that you’re going to need another algorithm.
Sometimes the results may suggest you’ve chosen the wrong type of algorithm altogether. Model selection is often based on assumptions about the data, such as expecting a linear progression between two elements when their relationship could be more accurately represented by a decision tree.
Even if you have to make changes to your algorithm the good news is that there are many open-source algorithms available in public libraries. So you’ll rarely have to start from scratch.
3) Clean your data or use a different data set
Problems with data sets often aren’t clear until you begin your analysis, so it’s important to make sure there’s no ‘garbage in’ or you’ll simply generate ‘garbage out’. If you’re positive the data set you’re working with is clean but are still experiencing issues, you may need to obtain more data or update your metadata. It may even be necessary to change some processes to capture the data you need.
Generally, more data helps produce great insight and better answers. Most businesses have already squeezed as much value as possible out of the data stored in traditional data warehouses. When they add a new set of data— especially unstructured data, such as text progress notes written by doctors or documented interactions between call centre employees and customers — they find new predictive power.
As you test an analytics project, add data in sequence to see how they change the answers. So long as your answers keep getting better, you most likely haven’t hit the point of intractability. When your progress slows, take stock of the cost of possible approaches versus the potential payoff.
Most importantly, always be wary that trying to predict human behaviour too accurately often leads you down the path of intractability.
For more information on analytics and how it can improve your business stay tuned to the Intel IT Center.
With summer holidays in full swing, Brand Expedia’s Senior Vice President Gary Morrison explains how Moore’s Law has played a pivotal role in the development of the business and how it’s reaping the benefits of a ‘test and learn’ strategy.