Last week I started my series of posts about Bayes’ Rule and why it was the foundation of good business decision making . In the next few weeks I will hit on some fun applications but today wanted to build further the foundation for using effectively Bayes’ Rule by discussing continuous parameter values (or multiple existing data points). Again, I borrow heavily from easily the best work on Bayes’ Rule, James Stone’s book Bayes Rule: A Tutorial Introduction to Bayesian Analsysis. This application will be particularly relevant when using Bayes’ Theorem to make the best decisions in your green light process, corporate development, investing or anywhere there are multiple historical results to examine.
Multiple data points are referred to as “continuous variables.” The values of a continuous variable are like densely packed points on a line, where each point corresponds to the value of a real number. The main advantage of working with continuous values is that we can usually describe a probability distribution with an equation, and because an equation is defined in terms of a few key parameters, it is said to provide a parametric description of a probability distribution.
To make the above relevant to you, think of yourself as a VC. You are looking at a potential investment. You start by looking at how similar investments over the last two years performed; this return on investment represents the points on the line. The parameters could be the space the business occupied, management team and level of investment. Rather than potential investments, to keep the analysis simple I will use a coin flip as an example.
Suppose we have a vast wealth of experience in estimating coin biases (the likelihood a coin lands heads or tails with coins that are not fairly balanced), and that we have kept a record of the estimated bias of every coin we have ever encountered (again and for the rest of the post, borrowing heavily from Stone’s work). We can plot the frequency with which each bias occurs as a histogram. The histogram represents all the knowledge implicit in our previous experience of coin biases. It effectively provides a hint about the probable value of a coin’s bias (or a VC investment) before we have flipped that coin once. The width of the histogram cannot exceed 0.5 (representing a 50% bias or 50/50 chance). The width is also an indicator regarding our certainty of the prior knowledge of a coin’s bias. If the histogram were very narrow when we would be reasonably certain that any new coin we choose will have a bias closer to 0.5. Conversely, if the histogram were quite broad (but still centered on 0.5), then our best guess at a new coin’s bias would also still be 0.5, but we could be much less certain what its value will be close to 0.5. The binomial equation to represent this graph is p(⊝) = ⊝2 (1-⊝), with ⊝ as the continuous parameter; this is the prior probability density function.
The posterior and a rational basis for bias
Using Bayes’ Rule, if you combine this function with the likelihood the event will occur (for the math person, p(x|⊝) = ⊝7 (1 – ⊝)3), you can choose the best value for the continuous parameter, ⊝. Using Bayes, you would derive an equation p(⊝|x) = ⊝9 (1-⊝)5.
For a coin that lands heads with probability ⊝true, we use the observed proportion of x of heads to estimate ⊝true. But all measurement devices are imperfect, so the measured estimate x of the true proportion of heads x true is noisy. There are at least two sources of uncertainty: uncertainty in the measured value x, and uncertainty in the relation between x and the parameter ⊝.
These sources of uncertainty translate to a corresponding uncertainty in the value of ⊝, which defines a likelihood function p(x|⊝). However, if we know the underlying (or prior) distribution p (⊝) the value of ⊝ in the world then we can use this as a guide to reduce uncertainty in ⊝. In essence this is what Bayes’ Theorem does. Bayes’ Theorem tells us how to adjust estimated value. Bayes’ Rule provides a rational basis for imposing a particular choice of value (the prior, which may appear biased) on estimated parameter values, to arrive at a value that represents our estimate of the most probable state of the real world.
The uniform prior
If there is no previous experience of an unknown parameter like coin bias, and therefore no reason to prefer any particular coin bias values other, there are two options:
- First, you could simply rely on the observed data (coin flip outcomes) in the form of a likelihood function, which implicitly acknowledge the lack of knowledge or previous experience of the value of a parameter.
- Second, you could make your lack of knowledge explicit in the form of a particular prior probability density function. Since you do not know how much weight to assign to each likelihood value, it is appropriate to weigh them all equally.
Either option has the same outcome.
I have discussed how the combined set of outcomes can be used to infer coin bias. Just as new evidence may arrive in a court, a new game may hit the market, so each coin flip provides additional evidence regarding the coin bias. Rather than waiting until all outcomes have been observed, you can use the posterior distribution from all outcomes up until the last coin flip as a prior distribution for interpreting the next flip outcome.
If successive outcomes are mutually independent, so that the outcome of the second flip does not depend on the outcome of the first flip then this mutual independence guarantees that the posterior distribution is identical whether it is computed using all of the data simultaneously or sequentially.
In the next weeks, I will be giving everyone a quiz; just kidding. Now that we have gone over the underlying math, I will explore different applications of Bayes’ Rule to key strategic and tactical business decisions.