Pooled testing: a teaching case study to use in a course about probability models

This summer, I created a few examples about COVID-19 to use in my course on probability models. I’ll post those materials here as I teach with them. Here is the first example.

Pooled testing to expand testing capacity

In July 2020, many states struggled to process COVID-19 tests quickly, with some states taking more than a week to process tests. Many statisticians have proposed pooled testing to process tests quicker and effectively expand testing capacity to up to four times the regular capacity. Pooled testing works when few tests come back positive.

Pooled testing came about in the 1940s, when government statisticians needed a more efficient way to screen World War II draftees for syphilis. “The Detection of Defective Members of Large Populations,” by R. Dorfman in 1943 contains a methodology for pooled testing.

Pooled testing works as follows:

  • Tests are grouped that pool n samples together, where each sample reflects an individual’s test sample.
  • Pooled test results are either positive or negative. They come back positive if at least 1 of the n individual samples are positive.
  • For tests that come back positive, tests are rerun individually with the unused portions of the original samples to see which individuals test positive, achieving the same results but faster. A total of n+1 tests are performed.
  • For tests that come back negative, no further testing is needed. We conclude all individuals are negative. One total test is performed, which reduces the overall tests.
  • When pooling is not used, one test per individual yields n tests for the group.

Consider a group of 40 asymptomatic individuals that are tested for COVID-19 in pooled groups of size . Let  denote the number of groups tested, and let  capture the number of groups that test positive (a random variable). We assume that an individual tests positive for COVID-19 with probability  (New York data from July 2020).

  • Express g as a function of n.
  • Express X and its distribution based on g, n, and q.
  • Let the random variable T denote the total number of tests run. Derive an expressive for T as a function of  as well as fixed parameters n and g.
  • Consider test groups of size n = 4, 5, 8, 10, 20. Which group size yields the fewest number of tests performed, on average? (Hint: Find E[T]).
  • How does your answer to the last question change if q = 0.02, 0.02, 0.075? (Note: Dane County had q = 0.02 and Wisconsin had q = 0.075 at the end of July 2020. At the time I wrote this in early October 2020, more than 20% of COVID tests are coming back positive in Wisconsin).

You can read more on the New York Times article that inspired this case study.

Files:

  1. The assignment.
  2. The solution.
  3. A google spreadsheet with the calculations (create a copy or download)

Leave a comment