Statistics in court: Incorrect probabilities

Although both the law and statistical theory have foundations that rest on formal rules and principles, courts can badly misapply statistical evidence and arguments. In some cases, even when arriving at a correct decision, the courts can accept or give an explanation that is inaccurate and unsound. In other cases, the misuse of statistics has led to false convictions and years of jail time for crimes not committed by the accused.

Probability poses a particular challenge for courts. In some cases, incorrect probabilities have been presented to juries because the simple multiplication rule has been used, despite the fact that events were most likely not statistically independent.

The multiplication rule for independent events states that if two events, A and B, are independent, then the probability of both A and B occurring is the probability of A multiplied by the probability of B. In mathematical notation: P(A and B) = P(A) x P(B). But if events are not independent and the multiplication rule is applied, the results can be dangerously misleading, as this article will show.

Our first example is from a civil case that is purported to be the first time statistical evidence was presented in US court. Next, four criminal cases will be discussed where expert witnesses for the prosecution misused the product rule and presented incorrect probabilities to the jury. In two of these cases, an appeals court later reversed the conviction. In the other two cases, the defendants collectively spent 24 years in jail until they were exonerated by DNA evidence.

The Howland will forgery trial

In 1868, the will of Sylvia Ann Howland was the subject of a legal battle. She had left roughly half her fortune to her niece, Henrietta Howland Robinson. However, Robinson claimed a different will left her the whole estate. The attorney for the estate argued that the signature on the will Robinson preferred was a forgery, and so Robinson’s attorney asked Oliver Wendell Holmes, Sr., professor of anatomy and physiology at Harvard, to testify about the authenticity of the signature. He examined the signature under a microscope and found no evidence of a forgery.

The attorney for the estate then called Benjamin Peirce, professor of mathematics at Harvard, to testify. Peirce used statistical techniques to decide if the signature was “too similar” to another known signature of Sylvia Howland. He concluded that the “probability of finding 30 downstroke matches in a pair of signatures was once in 2,666 millions of millions of millions”.

Pierce assigned probabilities for each of the 30 similarities and then multiplied to arrive at the final, extremely small, probability. However, in a 1980 paper, Paul Meier and Sandy Zabel took exception to this calculation.¹ They discuss the “use and abuse of the product rule for multiplying probabilities of independent events”. They argue that the 30 separate events/similarities are most likely not independent.

The court ultimately decided this case against Robinson on a separate legal point, not on whether the will was a forgery.

The trial of Malcolm Collins

It is said that there are two kinds of statistics: those that you look up, and those that you make up. The case of People v. Collins contains examples of the latter.

The defendant, Malcolm Ricardo Collins, was accused of robbery and was convicted. The conviction was appealed to the Supreme Court of California. The following evidence was presented at the original trial.

A woman had her purse stolen. The witnesses did not get a good look at the robber’s face; however, witnesses were able to describe some characteristics of the robber (a white woman with a blonde ponytail), the get-away car (a yellow vehicle), and the driver (a black man with a beard and moustache). At trial, the prosecution called an instructor of mathematics to testify. The instructor explained the product rule for multiplying probabilities of independent events. The prosecution suggested the following probabilities to the instructor: black man with a beard, 1 in 10; man with a moustache, 1 in 4; white woman with pony tail, 1 in 10; white woman with blonde hair, 1 in 3; yellow automobile, 1 in 10; and interracial couple in car, 1 in 1,000. He asked the instructor what the probability would be for these events to occur simultaneously, using these estimates, and the instructor gave the reply: 1 in 12,000,000. The prosecutor claimed these estimates were conservative and that the real probability was closer to 1 in a billion.

The jury found Collins guilty, but the ruling of the appeals court noted: “It is a curious circumstance of this adventure in proof that the prosecutor not only made his own assertions of these factors in the hope that they were conservative… but invited the jury to substitute their estimates.” The court continued: “There was another glaring defect in the prosecution’s technique, namely an inadequate proof of the statistical independence of the six factors.”

For instance, the probability of a man with a beard may not be independent of the man having a moustache. Also, wouldn’t the fact that an interracial couple was in the car already be taken into account, given that the alleged robber was a white female and the man driving the get-away car was said to be black? The final ruling of the appeals court was: “Mathematics, a veritable sorcerer in our computerized world, while assisting the trier of fact in the search for truth, must not cast a spell over him. We reverse the judgment.”

Unfortunately, the explanation of lack of independence by the appeals court in the Collins case is incorrect. The court states that the events “black men with beards and men with moustaches represent overlapping categories”. While this statement is true, it is not the reason the events are not independent. Colin Aitken, in his book Statistics and the Evaluation of Evidence for Forensic Scientists, gives the correct explanation: “The statistical testimony lacked an adequate foundation both in evidence and statistical theory. …The first reason refers to the lack of justification offered for the choice of probability values and the assumption that the various characteristics were independent. As an example of this latter point, an assumption of independence assumes that the propensity of a man to have a moustache does not affect his propensity to have a beard.”²

The fact that black men with beards and men with moustaches are overlapping categories is not the reason these categories are not independent.

The trial of Sally Clark

Sally Clark was a solicitor in Cheshire, England.³ Her son, Harry, born three weeks premature, died eight weeks after birth, and she was accused of his murder. Her first child had also died, less than three weeks after birth, and while his autopsy concluded he had died of natural causes, Clark was now accused of murdering him too. She was arrested and charged, despite the fact that there was little evidence against her.

At Clark’s trial, Sir Roy Meadow, a pediatrician, gave statistical testimony. He asserted that the probability of a random baby dying a cot death (SIDS) if the mother is greater than 26 years old, affluent, and a nonsmoker, is 1 in 8,543, and therefore the probability of two children from such a family both having a cot death is (1 in 8,543) x (1 in 8,543) = 1 chance in 73 million. The judge’s summary to the jury included the statement: “Although we do not convict people in these courts on statistics … the statistics in this case are compelling.” After her conviction one juror said: “Whatever you say about Sally Clark, you can’t get around the 1 in 73 million figure.” Clark’s conviction was upheld on appeal.

In 2001, the Royal Statistical Society issued a news brief condemning the use of the multiplication rule for independence in this case, stating that: “This approach is statistically invalid. … The well-publicized figure of 1 in 73 million has no statistical basis.” In 2002, Ray Hill, professor of mathematics at the University of Salford, analysed other published data. He concluded that the probability of having a second child die a cot death, given a first child had a cot death, may be as high as 1 in 60. In 2003, after spending three years in jail, Clark’s second appeal was upheld, and she was released from jail. This was only after a new pro bono lawyer, while reviewing the evidence, discovered a pathology report revealing that Harry was infected with staphylococcus aureus, and that this fact had been hidden from her defense team.

Sally Clark died in 2007.

The trial of Jimmy Ray Bromgard

In 2015 the Federal Bureau of Investigation admitted that for over 20 years its forensic laboratory personnel had given flawed testimony in criminal trials involving microscopic hair analysis.⁴ In many cases they would claim that the probability that a scalp hair left at the crime scene would randomly match the defendant’s hair was 1 in 4,500. For a pubic hair the probability was 1 in 800. These probabilities were derived from a flawed study by Gaudette and Keeping.⁵

In the 1987 trial of Jimmy Ray Bromgard, concerning the rape of an eight-year-old girl, the prosecutor’s forensic expert testified that he found a match between the defendant’s scalp and pubic hair and hair from the crime scene. Without offering a basis, he stated that the probability of a match for scalp hair was 1 in 100 and for pubic hair was also 1 in 100. He opined that since the hairs were from different parts of the body, these matches were independent. Then, using the multiplication rule for independent events, he concluded that the chance that the hair came from someone other than the defendant was 1 in 10,000.

As in other cases where the multiplication rule was misused, the expert witness multiplied probabilities (to get a much smaller probability) without presenting any evidence that the events are independent. Indeed, a person’s scalp hair and pubic hair have similarities (e.g., color, cortical texture), in which case a match on scalp hair would not be independent of a match on pubic hair.

Bromgard was convicted and spent 14 years in jail until DNA evidence proved he did not commit the assault. Seventy-three other individuals, convicted primarily on the basis of microscopic hair analysis, were exonerated by DNA evidence, but only after spending a collective 1,056 years in jail.

The trial of Ray Krone

Ray Krone was convicted of murdering a bartender in 1992. The principal forensic evidence presented at the trial was a bite mark on the body of the bartender that was presumed to have been left by the murderer. Raymond Rawson, a forensic dentist, testified that the bite mark on the body not only matched the dentition of Krone but could only have come from Krone.

Rawson based his conclusion on a study he published in 1984, “Statistical evidence for the individuality of the human dentition.”⁶ The study claimed that human dentition is unique – however, the study had a number of flaws that made this claim untenable. Our earlier article has more details, but the basic issue is this: Rawson and colleagues used wax dental impressions from a convenience sample of 1,200 subjects to calculate the total number of unique positions of each of 12 teeth – six from the upper jaw, six from the lower jaw. They then computed the product of the number of positions for the upper and lower teeth, and both sets together, arriving at a number greater than four billion for the total number of unique positions for human dentition – which was more than all the people on Earth at that time.

As we wrote in 2016: “[T]his calculation is only valid if the data are independent. Yet it would seem obvious that the position of a tooth would be influenced by the positions of the teeth surrounding it. Therefore, the product rule does not apply in determining the total number of unique positions for human dentitions.”

Krone spent 10 years in jail before being released, after DNA evidence proved he was not the murderer. At least twenty-three other individuals, convicted primarily on the basis of bite-mark analysis, have been similarly exonerated.

Summing up

In criminal trials, especially those involving microscopic hair analysis and human bite-mark comparisons, incorrect probabilities have been presented to the jury. In many cases, probabilities were multiplied together without regard to whether the events were independent.

Regarding this problem of lack of independence, the appeals court in the Malcolm Collins case said: “Few defense attorneys, and certainly few jurors, could be expected to comprehend this basic flaw in the prosecution’s analysis.” Yet statisticians understand it, and justice may be better served if they were asked to review the calculations prior to trial.

The April 2019 issue of Significance, out now, has more on the topic of forensic science and statistics. You can read the articles for free online, or subscribe to receive the print version.

About the author

H. James (Jim) Norton is professor emeritus of biostatistics, Carolinas HealthCare System, Charlotte, N.C. He has been an expert witness or consultant in eight legal cases, including six that involved forensic evidence. His website is jimnortonphd.com. George Divine is a senior research biostatistician at Henry Ford Hospital, Detroit, Michigan.

References

Meier, P. and Zabell, S. (1980) Benjamin Peirce and the Howland Will. Journal of the American Statistical Association. Vol. 75, No. 371. pp. 497-506. ^
Aitken, C.G.G. (1995) Statistics and the Evaluation of Evidence for Forensic Scientist. Chichester: John Wiley & Sons Ltd. ^
Schneps, L. and Colmez, C. (2013) Math on Trial: How Numbers Get Used and Abused in the Courtroom. New York: Basic Books. ^
Norton, J., Anderson, W. and Divine, G. (April, 2016) Flawed forensics: Statistical failing of microscopic hair analysis. Significance, 26-29. ^
Gaudette, B.D. and Keeping, E.S. (1974) Attempt at determining probabilities in human scalp hair comparison. Journal of Forensic Sciences, 19(3), 599-606. ^
Rawson, R.D., Ommen, R.K., Kinard, G., Johnson, J. and Yfantis, A. (1984) Statistical evidence for the individuality of the human dentition. Journal of Forensic Sciences, (29:1)245-253. ^

Tags: