The key to using customer lifetime value (LTV) effectively is the understanding that it is a prediction, not a value. In my previous eight posts on LTV, I stressed the importance of LTV to the success of your game and company and the key components in determining LTV. After reading Nate Silver’s The Signal and the Noise, I realized that it is crucial to understand that LTV is a prediction and suffers the same risk as other predictions (e.g., elections, weather, sports scores).
The Uncertainty Principle
Many people mistakenly believe (and I may have inadvertently implied this in a previous post), that LTV is an exact function of virality, monetization and retention. It implies you put those variables into a formula and get out a number that shows precisely how much a player is worth. That would be the case if you did it with historical information after five years and then calculated how much that player had been worth to you. However, you are calculating how much the player will be worth, which is inherently different because you are predicting their future value.
The uncertainty principle, a key tenet of quantum mechanics (as popularized by Stephen Hawking), postulates that perfect predictions are impossible if the universe itself is random. Since you cannot have a perfect prediction, your LTV cannot be a distinctly quantified value. You are predicting future events (how much the player will monetize, how viral they will be and how long they will stay in your game) based on the available data. Your LTV model is a simplification of the world the player is in; you are looking at several variables but you cannot look at everything (e.g., chance of war, plague, everyone switching to Blackberry devices). In effect, your LTV calculation is very similar to a sportscaster’s estimate of how many home runs Albert Pujols will hit or a weatherman’s prediction on the likelihood of a hurricane to hit Cape Hatteras.
Once you realize LTV is not a distinct absolute value, you can then set LTV within a range of likely outcomes. A sportscaster is more accurate if he says Pujols has a 90 percent chance of hitting 30–40 home runs rather than explaining based on past performance and performance of other players his age he is likely to hit 36 home runs. The “36 home runs” might be the number his model kicks out but any number of things can create a deviation from that number. Similarly, saying you have an LTV of $4.50 is almost certain to end up being incorrect, but if your team calculates with 90 percent confidence that LTV will be between $3.50 and $5.00, that gives you a much more accurate estimate to make your decisions rather than working from the misperception that you will absolutely generate $4.50 from the acquired player.
Risk vs Uncertainty
Underlying the principle that LTV is a prediction is the difference between risk and uncertainty. Risk is something you can put a price on. Nate Silver offers a great example of risk. Say that you’ll win a poker hand unless your opponent draws to an inside straight; the chances of that happening are exactly 1 in 11.46. This is risk. It is not pleasant when you take a “bad beat” in poker, but at least you know the odds of it and can account for it ahead of time. In the long run, you’ll make a profit from your opponents making desperate draws with insufficient odds.
Uncertainty is risk that is difficult to measure. You might have an idea of the potential for something bad to happen but you do not know the real numbers, you could be off by a factor of a 100 or 1,000. An example of uncertainty would be predicting the likelihood that Jeb Bush will be the Republican candidate for President in 2016.
The distinction between risk and uncertainty is important because confusing uncertainty for risk can destroy your company. Correlation of past data does not create certainty. Thus, even if your LTV calculation incorporates hundreds of metrics, that does not mean that the value it kicks out will definitely be accurate. If you use that LTV to spend all your funds on advertising and it turns out wrong, then guess what? You are out of cash. This argument may sound like an over-simplification but it is what happened to Lehman Brothers (and many others) in 2008. Their models calculated an expected “risk” from mortgage backed securities and they built positions based on that risk model. The problem was that historical data did not incorporate all the factors in play in 2008 and the securities had much more downside potential than the “risk” model showed.
Chaos Theory
Chaos Theory shows the profound effects incorrect assumptions can have on a prediction, such as LTV. Although many readers are going to think that Chaos Theory is another phrase for “game development,” it actually shows that a small change in initial conditions can produce a large and unexpected divergence in outcomes. One assumption might be thinking some variables are independent when they are actually tied together. Thus, a change in these variables will have a much bigger effect than predicted. This is what happened with the mortgage industry, which saw historically that nominal real estate values never dropped significantly. What it did not account for was the effect of a drop on prices that would then reduce employment and spending, which in turn would drive home prices further. Chaos Theory can throw your LTV into similar chaos; a change that impacts one variable may have a tsunami effect on all variables.
Avoid overfitting data
In statistics, the name given to the act of mistaking noise for a signal is “overfitting,” which is described as the way that statistical models are “fit” to match past observations, With overfitting, your analysts give you a very specific solution to a general problem. For example, you might look at all the data from your hidden object game and determine that LTV was significantly positively impacted by changing all prices for limited editions by $0.25. However, it could have just been coincidence that happened at the same time the game was being improved. To avoid this problem, you (or your analytics team) need to test how much of the variability data is accounted for by your model.
The value of qualitative information
As LTV is a prediction, it is crucial to improve the accuracy of the prediction, and qualitative information can help. Unlike quantitative data, which is metrics that can be measured, qualitative data is observed but not measured. In most LTV formulas, only quantitative data is used to predict the performance of players. This can lead to worse predictions than if qualitative analysis was also included (for example, how much fun the game is). Think of it this way: When is less information better than more information? Qualitative information, while not specifically measured, is real information; you must figure out how to incorporate it into your LTV predictions.
The use of qualitative data goes against the instincts of many data analysts and some of the trends of big data but many of the best users of analytics understand the value of qualitative information. Billy Beane, the hero of Moneyball (whom I have written about multiple times), has used metrics to evaluate baseball players and gain a competitive edge. Yet Beane has also expanded significantly his scouting department (people who go out and watch and evaluate prospective talent) and his continued success is largely due to the ability to integrate the quantitative data (stats) with the qualitative data (the scouting reports).
Solutions
Given the predictive nature of determining (and the fact that you are no better off violating the laws of quantum physics as you are trying to fly from the 70th floor of the Empire State Building), there are several steps you can take to use LTV effectively
- Create a LTV range. Rather than looking at LTV as a single number, create a range of most likely outcomes. Creating a range will not only be more accurate but will also help you anticipate and plan for LTVs that come in at the top and bottom of the range. In terms of company culture, it is important not to treat your analysts as if they are “stupid” by not giving you a firm number but “smart” for understanding a range is more accurate.
- A/B Test. A/B testing (using randomized experiments with two variables) can change a prediction into a variable. By moving from prediction to value, even if only on certain elements, you reduce the variance of potential outcomes.
- Surveillance. Surveillance is the practice of regularly comparing data with predictions and then adjusting your predictions to fit reality. Just as it is important to encourage your analysts to create a range rather than providing a firm number, you also need to be supportive of their revising their LTV calculations as more information comes in. To quote John Meynard Keynes (and probably the only time I ever will), “When the facts change, I change my mind.”
- Avoid overfitting. This is pretty straightforward but also very important. With current tools, you can look at reams of historical data and fit a model around it. As mentioned earlier, you need to test how much of the variability data is accounted for by your model.
- Include qualitative data. With social and mobile games, there is a lot of qualitative data available. Is the game fun? Is the art as good as competitive games? Does it get boring? It is folly to believe these attributes do not count; you must integrate them in your modeling. If your LTV is very positive but you know the game is bad, your LTV will turn out to be incorrect.
Hi Lloyd, good post to introduce the uncertainty concepts behind LTV. Using confidence intervals around predictions is a good way to ensure better understanding by the end users.
What are your thoughts on whether confidence intervals can be generated for user level LTV? The reason I ask is that in Sonamine analysis of many games, LTV tends to follow a power law distribution, instead of the “normal” bell shaped distribution.
LikeLike
I agree completely. I did not mean to imply it would be a normal distribution, which might allow you to narrow the range but also shows the impact a little chaos can cause.
LikeLike