Using machine learning to develop your hypothesis

There are many applications for machine learning, but one of the most exciting is using it to create hypothesis to test. An article in the Economist, “Computer says ‘try this’,” discusses many of the ways computers are now creating hypothesis to move medicine, farming and even cooking forward.

There are already many projects using machine learning to generate valuable and novel hypotheses. One project, BrainSCANr, suggests research topics for neuro-scientists by looking at millions of peer-reviewed papers. Research published by Baylor College of Medicine researchers used machine-learning software to review over 150,000 papers on a technique to curb the growth of cancer (proteins called kinases). The algorithm led to seven new kinases that researchers had missed. The technique is even being used to analyze search terms in Bing and Internet Explorer to determine potentially dangerous pairings of medicine.

Machine learning generates strong hypotheses

What it means for you

While machine learning is already benefitting many tech and game companies, using it to help develop hypotheses for your business is invaluable. If you are a mobile game company, think of the value of having a machine-learning system suggest that rather than focusing on your premium exchange rate, you should test changing the frequency of your free-coin bonus.

Changing the numbers does not change the reality

I am a huge proponent of using analytics and other metrics to drive business decisions, but I repeatedly see people making a huge and avoidable mistake. Instead of using the data to determine the best strategy, they use data to justify their intuition. A good analyst can use data to draw virtually any conclusion and if the analyst is pushed in a certain direction by the business leader, all the data does is provide people with cover for the decision rather than leading you in the optimal direction.

The same situation applies to financial analysis. I have seen people frequently manipulate numbers, often with the approval or even encouragement of the target audience, to tell the story people want to hear. I have seen this manipulation in sales, in corp dev and in internal forecasting. In all situations, it is actually just a rationale to make a decision the person already wants to make.


Data manipulation

The first part of the problem is manipulating the data. I am not talking Enron here, but more subtly and maybe not even intentionally. People will often select the data that supports their position while discounting the other information. If you want to greenlight a certain feature, you may look at the impact on retention while neglecting the impact on monetization and rationalize it by saying it is a retention feature. Regardless of whether it is a retention or monetization, your goal is to optimize lifetime value (LTV) so you need to look at the data holistically.