Binomial test#
This criterion allows you to check whether the observed share of some feature in the sample confirms the hypothesis that in reality this share is a certain number \(p_0\).
Sources:
Very detailed article on habr (russian);
Theory#
\(p_0\) - hypothesis share of traits in the general population;
\(p\) - observed share of trains in the received sample;
\(H_0\) - \(p=p_0\) observed value confirms hypothesis;
\(H_1\) - the \(p \neq p_0\) sample doesn’t confirm that the real proportion of trains is \(p\).
The central statistic for this test is the Z-statistic:
Z-statistics are distributed according to the standard normal distribution \(N\):
So p-value for this test can be computed as:
Where \(F_N\) is the cumulative distribution function for the standard normal distribution.
Example#
In the cell below you can see generation of the sample of the manifestations of a particular trait. Selected theoretical and observed shares are printed.
import numpy as np
from scipy.stats import binomtest, norm
np.random.seed(10)
N = 500
p_0 = 0.3
sample = (np.random.rand(N) < p_0)
num_successes = sample.sum()
p = num_successes/N
print("Theoretical share:", p_0, ", observed share", p)
Theoretical share: 0.3 , observed share 0.308
The Z statistic for a particular example is calculated in the next cell:
Z_n = (p_0-p)/np.sqrt(p_0*(1-p_0)/N)
print("Z satatistic -", Z_n)
Z satatistic - -0.39036002917941365
And finally, the p-value for this case. So it’s the probability of getting such a difference between observed and assumed p if the assumption about p is true.
p_value = 2*scipy.stats.norm().cdf(Z_n)
print("p-value for the test -", p_value)
p-value for the test - 0.6962703401140226
Compare this with the performance of this test by scipy - absolutely the same result.
binomtest(num_successes, N, p_0)
BinomTestResult(k=154, n=500, alternative='two-sided', statistic=0.308, pvalue=0.6964055360678295)