What is causality?

As I re-watched fragments of Benjamin Button recently (a movie that I’m not particularly fond of, I’m afraid), I recalled a scene in which a particular incident was shown to be preceded by a series of events that took place that ultimately led to it. If at least one of these events had not happened, the narrator argued, the incident could have been avoided. This got me thinking about causes and what it really means to call something a “cause” of some event (more narrowly, how one assigns “blame” to things, or gives credit to things).

As we can easily predict, there is often no single cause of an event. WARNING: spoilers abound! Even though the driver, who struck Daisy, seems like he was the cause, if many other things hadn’t happened, Daisy would have been fine; shouldn’t those other things also be considered causes? How far back do we go? How do we attribute causality–it seems to us, viscerally, that some factors were more instrumental to the incident than others, but how do we quantify that? (For example, the invention of the automobile was among the causes but was it as significant as the fact that the driver didn’t pay attention to what was in front of him?).

This is a difficult question and so let’s start small. The universe is massive, with so much happening in such short periods of time that it’s easy to get confused. Let’s simplify the question as much as we can (it’s still going to be pretty complicated so bear with me).

I’m going to assume that the universe consists of a series of discrete decisions that lead to some event. The decisions are binary (and let’s assume that the choices are 0 and 1–whatever these numbers may stand for. For example, 0 may mean “forget the jacket” and 1 may mean “take the jacket”) and could stand for anything that has two distinct outcomes, so I’m using a broad definition of the word “decision” here. We will try to come up with some framework that allows us to talk about these decisions–for example, determine which decisions contributed to the event more than others. First, some caveats:

  • We will not deal with intent, that is, we will not be interested in what a particular decision intended to achieve and instead look at what it ended up achieving. This means we’ll be looking at cause in a way which may be different from the way the law looks at cause.
  • We will deal with decisions which are not biased in any way, i.e. behind each decision there is an agent who is either picking 0 or 1, and is not subject to chance. Hence, a “decision” to rain in Sahara is not a valid one since it’s not particularly likely to rain in Sahara.
  • We can model decisions which are biased, or decisions which are not binary, as a sequence of binary decisions, so we can proceed without loss of generality (I never thought I’d use this phrase again…)
  • We’ll assume that all decisions are discrete, that we can determine the consequence of every decision and that we have captured them all

Moreover, in order to talk about causality at all, we have to define equivalence of decisions and events properly. We’ll be dealing with parallel universes (i.e. universes in which a particular decision had been made differently) and, strictly speaking, events across universes are never the same (because the universe is slightly different if a decision is made differently, even if it seems that a particular event was not at all affected). We will identify an equivalence class of events (and equivalence classes of decisions) — buckets, where we put many similar events and treat them as the same, as the event.

For example, consider the event “Daisy breaks her leg as a result of a car accident”. If, in a parallel universe, some decisions are made differently, and Daisy breaks her other leg, the event is now different, but we still care about it (for all intents and purposes, she’s broken a leg). So we’ll put the two events in the same equivalence class. Similarly, the driver’s decision to not pay attention should be treated as the same decision regardless of the decision to forget the jacket, so we need equivalence classes of decisions.

Let’s work with an example. Consider three decisions \(d_1\), \(d_2\), \(d_3\), that led to some event \(E\). We can represent this as a tree of decisions:

A Partial Decision Tree

A Partial Decision Tree

The graph represents the current universe — the three decisions were made, and the event \(E\) happened. Now let’s introduce parallel universes — let’s determine what would have happened if either of the decisions had been made differently. Suppose that if \(d_3\) was 0, \(E\) could still have happened if two other decisions \(d_4\) and \(d_5\) were 1. If \(d_2\) was 0, there is no way for \(E\) to happen. If \(d_1\) was 0, \(E\) could happen if \(d_3\) was 1 and a new decision \(d_6\) was 1.

This is, of course, arbitrary — in the real world it may be difficult if not impossible to reason about what would have happened. But there again, I never made any claims about the practicality of this framework…

Note that in our example above decision \(d_3\) is reused: it appears in the parallel universe as well as in the actual one. This is fine — a decision may appear in many universes (especially if decisions are unlikely to influence one another, for example, because they happen far away from each other). This is also why we need equivalence classes of decisions (so that we can talk about similar decisions rather than treating every decision as a unique one). Those “reused” decisions will be tricky to analyze: on one hand, they are the same decision, so we could talk about it in the abstract, regardless of what our universe looks like (i.e. regardless of what decisions have actually been made); on the other hand, by the time a decision needs to be made, other branches of the tree (that may include the same decision!) can be pruned. We’ll solve this problem shortly.

We can complete the decision tree:

A Full Decision Tree

A Full Decision Tree

Now we can look at all combinations of different decisions and see if \(E\) occurred or not (\(x\) means “any value”):

1 2 3 4 5 6	  E?      # distinct combinations
0 x 0 x x x	  0	  16
0 x 1 x x 0	  0	  8
0 x 1 x x 1	  1	  8
1 0 x x x x	  0	  16
1 1 0 0 x x	  0	  4
1 1 0 1 0 x	  0	  2
1 1 0 1 1 x	  1	  2
1 1 1 x x x	  1	  8

For example, if \(d_1\) was 0, \(d_3\) was 1 and \(d_6\) was 1, the event happened regardless of the values of the other three decisions (and there are 8 such combinations).

Now we can compare the decisions and see which one caused \(E\) the most. For each decision, determine for how many combinations \(E\) happened, and for how many it didn’t (if a decision didn’t matter for the outcome, for example \(d_2\) in the case when \(d_1\) is 0, we can exclude the scenario from our calculations). The difference between these two numbers is the extent to which that decision caused \(E\). For example, if \(d_1=0\), \(E\) is caused in 8 out of 32 combinations. If \(d_1=1\), \(E\) is caused in 10 out of 32 combinations. So regardless of \(d_1\), \(E\) is caused in at least 8 out of 32 combinations (so that much wasn’t caused by \(d_1\)). However, the remainder — 2 out of 32 combinations — were caused directly by \(d_1\) so the causality of \(d_1\) is 2/32 = 1/16.

Similarly we can compute the causality for the other decisions:

  • \(d_2\): if equal to 0, causes \(E\) in 0 combinations; if equal to 1, causes \(E\) in 10 out of 16 combinations. So its causality is 5/8
  • \(d_4\): causes 0 if equal to 0; 2 out of 4 if equal to 1. Its causality is 1/2
  • \(d_5\): causes 0 if equal to 0; and all (2 out of 2) if equal to 1. Its causality is 1
  • \(d_6\): causes 0 if equal to 0; and all if equal to 1. Its causality is 1

Let’s now look at the tricky case — \(d_3\). It appears in two branches of the tree. The answer to how much it caused \(E\) will depend on how much the agent making the decision knows about the decisions made up until now (specifically, \(d_1\) and possibly \(d_2\)). In other words, if the agent knows which branch of the tree he is in, he’s causing \(E\) to a different degree than if he had no such information.

  • If the agent has no information, we need to look at all combinations. If \(d_3\) is equal to 0, \(E\) is caused in 2 combinations out of 24; if it’s equal to 1, in 16 out of 24. Its causality is 7/12
  • If the agent has perfect information, we need to consider which branch of the tree he’s in.
  • If he’s in the right branch of the tree: if the decision is equal to 1, \(E\) is always caused. Otherwise, \(E\) is caused in 1 out of 4 combinations. The causality is 3/4.
  • If he’s in the left branch of the tree: if the decision is equal to 1, \(E\) is caused in 1 out of 2 combinations. Otherwise, it’s never caused. The causality is 1/2.
  • Hence \(d_5\) and \(d_6\) cause \(E\) the most (which makes sense: since they are at the bottom of the graph, they have full control over whether \(E\) happens or not). \(d_1\) causes \(E\) the least. However, it’s not always true that the higher up the tree a decision is, the less it contributes to the event: \(d_2\) contributes more than \(d_3\) (with no information).