Making the right decision, in business and in life, is the most important thing you can do. Wrong decisions can haunt you your entire life while the right decision can mean making your company worth billions, years of happiness, etc. Imagine if Travis Kalanick, CEO of Uber, had decided to focus on connecting buses with passengers and not taxis, or if Trip Hawkins would have focused 3DO on creating software and not a hardware platform. Understanding Bayes’ Theorem (also known as Bayes’ Rule, two terms I will use interchangeably) increases the chance you use data the right way to make your decisions.
This post is the first in a series I will be writing on Bayes’ Rule. This post and most of the background I discuss is based on the best book I have found about Bayes’ Rule, A Tutorial to Bayesian Analysis by James Stone. Last year, I wrote several posts on Lifetime Value (LTV), given how crucial it is to the success of any business, from the newest technology to the oldest brick and mortar enterprise. This year, we will be tackling Bayes’ Theorem. As you will see in the next few posts, by understanding Bayes’ Theorem you can then make optimal decisions about what games or projects to green light, how to staff your company, what to invest in, which technology to use, who to sell your company to, what areas of your company need to be fixed/improved, etc. Bayes’ Theorem is the single most important rule for good decision-making, both in your professional and business life.
What is Bayes’ Theorem?
Bayes’ Theorem is a rigorous method for interpreting evidence in the context of previous experience or knowledge. Bayes’ Theorem transforms the probabilities that look useful (but are often not), into probabilities that are useful. It is important to note that it is not a matter of conjecture; by definition a theorem is a mathematical statement has been proven true. Denying Bayes’ Theorem is like denying the theory of relativity.
Some examples of Bayes’ Rule
The best way to understand Bayes’ Rule is by example (I will touch on the math later). Again, much of this is based on Stone’s tutorial on Bayesian analysis. First, look at probability as the informal notion based on the frequency with which particular events occur. If a bag has 100 M&Ms, and 60 are green and 40 are red, the probability of reaching into the bag and grabbing a green M&M is the same proportion as green M&Ms in the bag (i.e., 60/100=0.6). From this, it follows that any event can adopt a value between zero and one, with zero meaning it definitely will not occur and one that it definitely will occur. Thus, given a series of mutually exclusive events, such as the outcome of choosing an M&M, the probabilities of those events must add up to one.
Although getting the right color M&M is important to many, Bayes’ Rule can also apply to a life-or-death situation. Say you wake up with spots all over your face. You rush to the doctor and he says that 90 percent of the people who have smallpox have the symptoms you have. Since smallpox is often fatal, your first inclination may be to panic. Rather than freak out, you then ask your doctor what is the probability you have smallpox. He would then respond 1.1 percent (or 0.011). Although still not great news, it is much better than 90 percent—but more importantly it is useful information.
It is actually even more useful for the doctor who must decide on a prescriptive regimen. The doctor knows that 90 percent of people with smallpox have these spots and 80 percent of people with chickenpox have these spots. If he does not understand probability or know Bayes’ Theorem, he might think you are slightly more likely to have smallpox and begin treatment based on that. If he is wise and understands Bayes’ Theorem (which again is a proven mathematical certainty), he would know that since chicken pox is common and smallpox is rare the likelihood of smallpox is only 1.1 percent and thus you should be treated for chickenpox. This knowledge, or prior information, should be used to decide which disease the patient should be treated for.
Understanding and using Bayes’ Theorem
What Bayes’ Theorem is leading you to is a weighted likelihood (formally called a “posterior probability”) of a conditional probability. By making use of prior experience (in this case knowledge about the percentage of the population that actually has smallpox), you transform the conditional probability of the observed symptoms given a disease (the likelihood based only on the available evidence) into a more useful conditional probability, i.e., the likelihood the patient has a particular disease (smallpox) given that he has particular symptoms (spots).
Conditional probability (smallpox if spots are seen) is based only on the observed data, and is therefore easier to obtain than the conditional probability, which is based on both observed data and prior knowledge. Historically, conditional probability is referred to as likelihood while the complementary conditional probability is the posterior probability the patient has smallpox given that he has the spots. In essence, Bayes’ Theorem is used to combine prior experience (in the form of prior probability) with observed data (spots) to interpret these data. This process is called Bayes’ Inference.
Bayesian inference is not guaranteed to provide the correct answer. It provides the probability that each of a number of alternative answers is true, and these can then be used to find the answer that is most probably true. Although this may not sound ideal for making informed decisions, where you cannot afford a mistake, it can be shown mathematically that no other process can provide a better guess , so that Bayesian inference can be justifiably interpreted as a perfect guessing machine.
The math behind Bayes’ Theorem
Anyone interested in the mathematical foundation of Bayes’ Rule is probably not using my blog to find that information (and should check out Stone’s book, , but for those who want to see how it is derived, it is straightforward. The formula for Bayes’ Theorem is
PE(H) = [P(H)/P(E)] PH(E)
PH(E) is the “likelihood” of H on E. It expresses the degree to which the hypothesis predicts the data given the background information codified in the probability P. PE(H) is the prediction term, how likely the event is to happen (e.g., in a greenlight process, how likely the game is to be successful). P(H) is the unconditional probability that the event occurred, that is how likely it is to happen if there is no previous knowledge. P(E) is the previous experience.
As the Stanford Encyclopedia of Philosophy points out, Bayes’ theorem is of great value in calculating conditional probabilities because inverse probabilities are typically both easier to ascertain and less subjective than direct probabilities. People with different views about the unconditional probabilities of E and H often disagree about E’s value as an indicator of H. Even so, they can agree about the degree to which the hypothesis predicts the data if they know any of the following intersubjectively available facts:
(a) E’s objective probability given H
(b) the frequency with which events like E will occur if H is true, or
(c) the fact that H logically entails E.
Scientists often design experiments so that likelihoods can be known in one of these “objective” ways. Bayes’ Theorem then ensures that any dispute about the significance of the experimental results can be traced to “subjective” disagreements about the unconditional probabilities of H and E.
Another way of looking at the formula is in the image below:
More to come
All decisions should be based on evidence, but the best decisions should also be based on previous experience. Over the next weeks and months, I will be going into more detail on Bayes’ Theorem and also explore how it applies to launching, leading and building a successful business.