I spent more than a decade forecasting futures as the manager of a hedge fund. We had tick-by-tick data going back decades, but there was a huge random component to this data that made automated prediction beyond a certain accuracy impossible. All the motives people have for buying and selling at a particular moment, combined with the sheer number of people trading, meant that no matter what we did, we’d never perfectly pluck signals from the noise.
In data science, we call these intractable problems, and, past a certain point, analytics, and big data may simply never make progress.
The good news is that many problems that at first seem intractable can be addressed by tweaking your approach or your inputs.
Knowing when problems that seem intractable can be solved with some affordable changes will position a business—and a project sponsor—for ongoing success. Conversely, being able to recognize problems that are defined at an unrealistic scale will prevent squandering time and money that you could profitably apply to a more focused question.
Here are four troubleshooting methods that could improve your results. By iteratively applying one or more of them, you could exchange banging your head against a wall for increasing the chances of finding value in your analytics work.
1. Ask a More Focused Question
Often, the best way forward is to try to solve for a subpart of your original question and extrapolate lessons. Trying to determine the likelihood that any given social-media user will be interested in a car model you’re designing is likely intractable. Even with lots of good data, you might have too many variables to arrive at a model with real predictive value.
Sometimes when you add a new set of data, skies open up, and you find new predictive power.