Three Rs approaches to the production and quality control of avian vaccines.

The Report and Recommendations of ECVAM Workshop 411,2

Reprinted with minor amendments from ATLA 28, 241-258.


Appendix 1

Sample Size Determination with Dichotomous Responses

Manfred Wilhelm

Biostatistics Section, Paul Ehrlich Institute, 63207 Langen, Germany


Dichotomous Response

A variable of interest in an animal trial that is designed to result in two categories (for example, survival/death, presence/absence of disease, positive/negative outcome) is called a dichotomous response; it can be quantified by rates and proportions. Thresholds of test procedures, as specified in monographs on avian live virus vaccines, are usually given as rates that allow the decision to be made: is the test valid and/or does the vaccine virus comply with the test?

Statistical Inference

In general, statistical inference is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample that has been randomly drawn from that population. Inference from a sample to a population relies on the assumption that the animals in the random sample are representative of all animals in the population. The empirical rate p (0% < p < 100%) of positive (negative) test results in a sample of animals due to monograph procedures is an estimate of the true rate of positive (negative) test results in the population. Since p is only a point estimate, we have to take into consideration its precision by calculating its confidence limits, which depend heavily on sample size. With regard to these principles of statistics, it remains unclear whether the thresholds specified in the monographs under investigation are intended to be controlled in the sample or in the population.

Confidence limit and sample size

Assuming a Binomial distribution, the exact two-sided lower (1-α)% confidence limit for an observed proportion of p=x/n < 100%,is given (1) by:

πlower, 1-α = x / (x + (n - x + 1) Fdf1,df2,1-α/2)) [1]

where x = number of positive (negative) events (x < n), n = sample size, Fdf1,df2,1-α/2 = 1-α/2)% quartile of F distribution with df1 = 2(n - x + 1) and df2 = 2x degrees of freedom.

In the case of x = n, formula [1] reduces to:

πlower, 1-α = n / (n + F2,2n,1-α) [2]

which is now the exact one-sided lower (1-α)% confidence limit for an observed proportion of p = 100%.

Based on formulae [1] and [2], Figures 1-3 illustrate the relation between the sample size and the exact two-sided and one-sided lower 95% confidence limits for p = 80%, p = 90% and p = 100% (2). In practice, determination of sample size has to be based on visual inspection of graphs such as those in Figures 1-3 or on statistical tables, as in (1). Nevertheless, after having chosen an adequate sample size n visually, we can check it by applying formulae [1] and [2].


Figure 1: Lower 95% confidence limit for observed proportion of 80%


Figure 2: Lower 95% confidence limit for observed proportion of 90%


Figure 3: Lower 95% confidence limit for observed proportion of 100%


Example of Newcastle disease B1 strain AHI (PEI project # 92.2063.3-01.100, p. 13-29)

The efficacy test revealed that all of n = 22 vaccinated and challenged chicks survived and did not show any signs of disease, i.e. p = 22/22 approximately equal to 100%. Apparently, there is stated an empirical test threshold of p = 90%, i.e. at least x = 20 (> 0.9·22) animals have to survive without showing any signs of disease in the sample. Given the least acceptable sample of size n = 22, i.e. p = 20/22 > 90%, from [1] and F6,40,0.975 = 2.74, it follows that:

πlower,0.95 = 20/(20 + 3·2.74) = 0.708 approximately equal to 70.8% [3]

Thus, if a proportion of at least 20 out of 22 chicks (> 90%) are observed to be healthy we can be 95% confident that the lowest possible true proportion of surviving chicks that do not show any signs of disease in the population of all vaccinated chicks is 70.8%. By using only a sample of n = 22 chicks, we cannot be 95% confident that a true 90% threshold of surviving birds which show no signs of disease does hold in the population of all vaccinated chicks! Even in the actual situation that all of n = 22 chicks were observed to be healthy, a lower 95% confidence threshold of 90% would require at least n = 29 animals in vaccine testing for efficacy, as can be seen from Figure 3.

The safety test revealed that all of n = 10 vaccinated and challenged chicks survived and did not show any serious clinical respiratory signs, i.e. p = 10/10 approximately equal to 100%. Due to [2] and since F2,20,0.95 = 3.49, we get:

πlower,0.95 = 10/(10 + 3.49) = 0.741 approximately equal to 71.4% [4]

i.e. we can be 95% confident that the true proportion of surviving chicks that do not show any serious clinical respiratory signs in the population of all chicks to be vaccinated is > 74.1%, or reversibly < 25.9% of all chicks to be vaccinated are expected to show signs of disease and/or die. By using a sample of only n = 10 chicks, we cannot be 95% confident that a true threshold of, say, 80% of surviving chicks that show no serious clinical respiratory signs can be expected in the population of all vaccinated chicks! Again, as can be seen from Figure 3, a lower 95% confidence limit of 80% (i.e. the desired true threshold in the population) requires at least n = 14 animals in vaccine testing for safety.

Summary The sample size n and minimum number of required positive (negative) events x out of n to hold an exact one-sided or two-sided lower confidence limit of at least π% in the population are:

π = 80
n 14*-25 26-33 34-40 41-47 48-54 55-61
n-x 0 1 2 3 4 5
π = 90
n 29*-53 54-69
n-x 0 1

* smallest possible sample size n.


Discussion

The main objective in biostatistics is to make statistical inferences from a random sample (of animals) of a certain size to the underlying population (of all such animals); the statistical validity of such conclusions is guaranteed by choosing the right sample size.

Since it is apparently not the primary objective of almost all monographs under investigation to do significance testing, it is not feasible to perform a sample size calculation, which is usually based on type I and II errors, the underlying variability, the relevant effect size, the kind of testing hypotheses and methods, and so forth (3, 4). The method for sample-size determination in this case, where almost no information exists, is based on confidence limits for estimated rates of interest.

It is accepted that the monograph is not intended to relate statistical conclusions from safety and efficacy testing under experimental conditions (for example, artificially high dose, kind of application) to the animal populations in the field, the statistical sample size considerations presented above are not applicable in this context. A reasonable sample size based on general requirements such as relevance, cost and animal welfare should be used instead. Without any power calculations, sample sizes of n > 10 are recommended as a common guideline in such situations (4).

References:

  1. Sachs, L. (1984). Applied Statistics: A Handbook of Techniques, 2nd edn, 701pp. New York, USA: Springer.
  2. Anon (1997). S-PLUS 4 Guide to Statistics. Seattle, WA, USA: Data Analysis Products Division, MathSoft, 425pp.
  3. Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, 2nd edn, 321pp. New York, USA: Wiley.
  4. Hothorn, L. A., Lin, K.K., Hamada, C. & Rebel, W. (1997). Recommendations for biostatistics of repeated toxicity studies. Drug Information Journal 31, 327-334.

Return to Main Article