T-test#

A T-test is a basic statistical test that works with mean values. There are three types of t-tests:

  • One-sample t-test: determines whether the mean value differs significantly from the known reference mean (expected value).

  • Two independent samples t-test: determines if the mean values of two independent groups differs significantly.

  • Paired samples t-test: determines if the mean values of two paired (and consequently related) groups differ significantly.

import numpy as np
from scipy import stats

One sample#

  • \(H_0\): the sample mean is equal to the given reference mean.

  • \(H_1\): the sample mean is not equal to the given reference mean.

Suppose we have a sample, \(X\), and want to compare wheather \(\overline{X}\) equals to the given reference mean \(\mu\).

Introduce \(t\) statistic:

\[t=\frac{\overline{X} - \mu}{\frac{s}{\sqrt{n}}}\]

Here:

  • \(s\): the standard deviation of \(X\).

  • \(n\): size of the sample - \(n = |X|\).

The variable \(t\) is distributed according to a Student’s \(t\) distribution with \(n-1\) degrees of freedom: \(t \sim T(n - 1)\).

Let’s take concrete realization of t-statistic: \(t'\). The probability that \(H_0\) is correct equals to the probability that \(T(n-1)\) takes value less extreme that \(t'\):

\[P\left(T\left[n-1\right] \in \left(-|t|, |t|\right)\right) = P(H_0)\]

So p-value can be computed as:

\[2F_T(-|t|)\]

Here \(F_T\): cumulative distribution function for Student’s t-statistic.


For example, the results of the test were computed without special tools and compared with the results of the sepcialized t-test function, scipy.stats.ttest_1samp.

The following cell generates the sample used in the experiment.

np.random.seed(11)
n = 500
sample = np.random.normal(0, 1, n)

The following code computes the t-statistic using only numpy and the t-test p-value using only the cumulative distribution function for the Student’s distribution.

t_stat = (np.mean(sample) - 0) / (np.std(sample, ddof=1) / np.sqrt(n))
p_value = (stats.t.cdf(-np.abs(t_stat), n - 1)) * 2
float(t_stat), float(p_value)
(-0.7531988423186389, 0.4516856402737408)

The following cell represents the results of the same type of computation for the special package.

t_stat, p_value = stats.ttest_1samp(sample, popmean=0)
float(t_stat), float(p_value)
(-0.7531988423186389, 0.4516856402737408)

Two sample#

  • \(H_0\): the mean values in both groups are the same.

  • \(H_1\): the mean values in groups differs.

If \(X_1\) and \(X_2\) are samples whose mean values must be compared, \(t\) statistic will take form:

\[t = \frac{\overline{X}_1 - \overline{X}_2}{\sqrt{\frac{s_1^2}{n_1}} + \sqrt{\frac{s_2^2}{n_2}}}\]

Here:

  • \(n_1\) and \(n_2\) are the sample sizes: \(n_1=|X_1|\), \(n_2=|X_2|\).

  • \(s_1\) and \(s_2\) are the standard deviations of the corresponding samples.

In the case of a two-sample t-test, the statistic \(t\) is distributed according to the Student’s distribution with \(n_1 + n_2 - 2\) degrees of freedom: \(t \sim T(n_1 + n_2 - 2)\).


The following cells compare the outputs of the t-test computed by hands and using special tool stats.ttest_ind.

np.random.seed(10)
n_1, n_2 = 200, 200
X_1, X_2 = np.random.normal(5, 2, n_1), np.random.normal(5, 2, n_2)

The following cell presents the procedure for computing the t-statistic with numpy and p-value using the cumulative distribution function.

s_1 = np.std(X_1, ddof=1)
s_2 = np.std(X_2, ddof=1)
diff = (np.mean(X_1) - np.mean(X_2))
std = (np.sqrt(((s_1 ** 2) / n_1) + ((s_2 ** 2) / n_2)))

t_stat = diff / std
p_value = 2*stats.t.cdf(-np.abs(t_stat), n_1 + n_2 - 2)
float(t_stat), float(p_value)
(0.48741835388636506, 0.6262302474629055)

The same output was achieved using the stats.ttest_ind function.

statisitics, p_value = stats.ttest_ind(X_1, X_2, equal_var=False)
float(statisitics), float(p_value)
(0.48741835388636506, 0.6262310310013004)

Paired samples#

Paired samples assume that there is a single group of objects being observed, and that the mean of their measurements changes under different conditions or over time.

  • \(X_1\), \(X_2\): sets of observations under two conditions.

  • \(x_{i1} \in X_1\): the \(i\)-th observation from the first condition.

  • \(x_{i2} \in X_2\): the \(i\)-th observation from the second condition.

Since \(X_1\) and \(X_2\) are related (i.e., paired), the standard two-sample t-test is not appropriate. Instead, the problem can be reduced to a one-sample t-test by computing the differences \(\delta_i = x_{i1} - x_{i2}\) and testing the null hypothesis \(H_0\): the mean of \(\delta_i\) is equal to zero.