Binomial test

Contents

Binomial test#

This criterion allows you to check whether the observed share of some feature in the sample confirms the hypothesis that in reality this share is a certain number \(p_0\).

Sources:

Theory#

  • \(p_0\) - hypothesis share of traits in the general population;

  • \(p\) - observed share of trains in the received sample;

  • \(H_0\) - \(p=p_0\) observed value confirms hypothesis;

  • \(H_1\) - the \(p \neq p_0\) sample doesn’t confirm that the real proportion of trains is \(p\).

The central statistic for this test is the Z-statistic:

\[Z_n = \frac{p_0 - p}{\sqrt{\frac{p_0(1-p_0)}{N}}}.\]

Z-statistics are distributed according to the standard normal distribution \(N\):

\[Z_n \sim N.\]

So p-value for this test can be computed as:

\[p_{value} = 2F_N(Z_n)\]

Where \(F_N\) is the cumulative distribution function for the standard normal distribution.

Example#

In the cell below you can see generation of the sample of the manifestations of a particular trait. Selected theoretical and observed shares are printed.

import numpy as np
from scipy.stats import binomtest, norm 
np.random.seed(10)

N = 500
p_0 = 0.3

sample = (np.random.rand(N) < p_0)

num_successes = sample.sum()
p = num_successes/N
print("Theoretical share:", p_0, ", observed share", p)
Theoretical share: 0.3 , observed share 0.308

The Z statistic for a particular example is calculated in the next cell:

Z_n = (p_0-p)/np.sqrt(p_0*(1-p_0)/N)
print("Z satatistic -", Z_n)
Z satatistic - -0.39036002917941365

And finally, the p-value for this case. So it’s the probability of getting such a difference between observed and assumed p if the assumption about p is true.

p_value = 2*scipy.stats.norm().cdf(Z_n)
print("p-value for the test -", p_value)
p-value for the test - 0.6962703401140226

Compare this with the performance of this test by scipy - absolutely the same result.

binomtest(num_successes, N, p_0)
BinomTestResult(k=154, n=500, alternative='two-sided', statistic=0.308, pvalue=0.6964055360678295)