Pages

How to recognize collider bias

Marilyn vos Savant holds the highest recorded intelligence quotient (IQ) in the Guinness Book of Records. In 1990 she answered an intriguing question in her regular column in Parade magazine:


You are on a game show and you are given the choice of three doors. Behind one door is a car. Behind the other doors are goats. You pick a door and the host (Monty Hall) always opens a different door that reveals a goat, never the car. Monty then asks you if you would like to stay with your original choice or choose a different door.

Is it to your advantage to switch your choice of doors?

This dilemma was named the Monty Hall problem after the host of a game show called "Let's make a deal" who had a reputation for playing mind games with contestants.


 



Marilyn correctly answered yes, you are twice as likely to win if you switch rather than stay. Switching will win 2 in 3 times, whereas staying will only win 1 in 3 times. She explained why, using the following table of outcomes when you choose door #1.


Table 1: Outcomes when you choose door #1


Door 1

Door 2

Door 3

Switch

Stay

Car

Goat

Goat

Lose

Win

Goat

Car

Goat

Win

Lose

Goat

Goat

Car

Win

Lose


Many people seeing the puzzle and solution for the first time find the answer hard to accept.


Common reactions include:

  • How could it make any difference if I stay or if I swap?
  • How can the probability of a door winning change?
  • With 2 doors remaining, aren't both 50/50?
  • There's something weird about this!

The solution is counterintuitive, and has become a popular paradox to illustrate how event dependencies bias probabilities. The game show scenario is entirely contrived, but the abstract situation of understanding the biases introduced by event dependence is necessary to correctly assign causation. For that reason, it's worth delving a little deeper into why the scenario seems so strange, and how we can identify situations where such biases arise.


The table is a convincing argument in so much as we can clearly see 3 possible outcomes. But it may remain mysterious as to why each outcome is completely deterministic. Let's trace through the events in chronological order to see what is going on:


Figure 1: Chronological events of the game


First, the game show producer sets up the goats and the car. We'll focus on the scenario of the car being placed behind door #1, and goats behind doors #2 and #3.


Second, we make a choice:

  • If we choose door #1, then Monty will reveal either door #2 or door #3, because both contain goats. It doesn't matter which door he reveals. If we swap to the remaining door it will contain a goat, and we will lose. The only way we win is if we stay.
  • If we choose door #2, then Monty can only reveal door #3, because the car is behind door #1. Once he reveals door #3 we can only swap to door #1. So in this case we will win if we swap, and lose if we stay.
  • If we choose door #3, then Monty can only reveal door #2, because the car is behind door #1. Once he reveals door #2 we can only swap to door #1. Again we will win if we swap and lose if we stay.

From the figure 1 we can see that 2 out of 3 times Monty has no choice at all. There is only one door that he can open, and the other door contains the car. We know that the most likely scenario is that our first guess was wrong, that Monty could only open one door, revealing that the car is behind the remaining other door. The wrong guess forces him to reveal the right answer.


When bound by the rules of the game, Monty's actions are dependent upon two prior events. Monty's event is constrained by the first event (placement of the car), and your guess (which was random). This configuration of events is called a "collider". Colliders occur whenever a variable has 2 causes.


Figure 2: A collider causal relationship of variables




Collider bias (also known as Berkson's bias) shows up in many situations. The precondition is that the action we are considering depends on 2 other events.

Notable examples of colliders include:

  • Disease 1 -> Hospitalization <- Disease 2 If you only look at cases where hospitalization occurred, you are removing cases that affect the frequency of 2 diseases being present.

  • Drug -> Blood Pressure -> Heart Attack <- Blood Pressure If you only look at cases where a heart attack occurred, you are removing cases where the drug prevented the heart attack by lowering blood pressure.
  • Attractive -> Date <- Nice If you only judge people you date, you are removing ugly mean people cases, and might mistakenly believe attractive people are mean.

It's as though we flipped 2 coins many times independently, counting the results but ignoring all the times that double tails occurred. Naturally the data will indicate that heads usually cause tails, and tails usually cause heads. We don't set out to collect biased data, but it is easier to gather data on diseases from hospitalized individuals.

Bias occurs when conditioning on the collider. Coming back to our original paradox, if we only look at cases where Monty reveals a goat, we are removing cases that affect the probability of winning a car. If Monty always reveals a door, my door causes Monty’s action (⅔ of the time), the frequency that switching wins increase.

Conditioning on a collider creates a spurious association between the contributing variables. If 2 events contribute to an event that is always present, there is collider bias. To understand if 2 events cause such an event, one can draw a causal diagram of all the variables and which causes which.

Why the Monty Hall problem is counter-intuitive and entertaining


Reframing the problem to focus on what causes Monty's action helps explain what is going on, but doesn't make us completely comfortable with the solution either. In the context of a game our expectation is that the other player will be trying to beat us.

Consider what would happen if Monty could choose whether to reveal a door, or just open your chosen door. Now Monty can play the game adversarially. If you choose the wrong door, he opens it and you lose. If you choose the correct door, he shows a different door in an attempt to get you to switch your choice. Under such circumstances if he does open a door, it means you should stay with your current choice. Switching would lose every time. I think that when presented as a game, our intuition is correctly on alert that Monty is trying to trick us, that he is playing the game to win. Most games are fair or stacked against us, so it would be very strange for someone to propose a game biased in our favor. Furthermore Monty might not have explicitly stated the rules of the game. If we don't know if he had to open a door, or if he just had the opportunity to attempt to trick us, then the safe choice is to stay with our current guess.

The paradox makes for interesting game show scenarios. They give us a chance to second guess ourselves. Most people want to stick to their original choice, perhaps reaffirming that they made the correct initial guess through some mystic connection. People who are in on the joke can laugh about it and decry the participants for not taking up twice the opportunity to win.

Episodes 41 and 42 of Survivor featured the Monty Hall problem under the title of "do or die". In both cases the contestants chose to stay with their original choice instead of switching (which would have given them a higher chance of winning), and in both cases they actually won (against the odds). It really captured the counter-intuitive nature of the paradox, and the stark fact that we can easily assume we made a good decision because we saw a good outcome, when really we just got lucky.

Figure 3: Survivor featured the opportunity to guess again for immunity

Conclusion


The takeaway is that recognizing colliders can help us make better decisions. Recognizing a collider is noticing that we are only considering cases where a particular event happens, but that event has 2 causes. And it helps to draw a diagram to find them.