Essay

Section 13 Kolmogorov-Smirnov test.
Suppose that we have an i.i.d. sample X1 , . . . , Xn with some unknown distribution P and we would like to test the hypothesis that P is equal to a particular distribution P0 , i.e. decide between the following hypotheses: H0 : P = P 0 , H 1 : P = P 0 . ⇒ We already know how to test this hypothesis using chi-squared goodness-of-fit test. If dis­ tribution P0 is continuous we had to group the data and consider a weaker discretized null hypothesis. We will now consider a different test for H0 based on a very different idea that avoids this discretization.

1 0.9 0.8 all data normal fit

1 0.9 0.8 men data ’men’ fit women data ’women’ fit

Cumulative probability

0.6 0.5 0.4 0.3 0.2 0.1 0 96 96.5 97 97.5 98 98.5 Data 99 99.5 100 100.5

Cumulative probability

0.7

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 96.5 97 97.5 98 98.5 Data 99 99.5 100 100.5

Figure 13.1: (a) Normal fit to the entire sample. (b) Normal fit to men and women separately.

Example.(KS test) Let us again look at the normal body temperature dataset. Let ’all’ be a vector of all 130 observations and ’men’ and ’women’ be vectors of length 65 each corresponding to men and women. First, we fit normal distribution to the entire set ’all’. MLE µ and   are ˆ ˆ 83

mean(all) = 98.2492, std(all,1) = 0.7304. We see in figure 13.1 (a) that this distribution fits the data very well. Let us perfom KS test that the data comes from this distribution N(ˆ,   2 ). To run the test, first, we have to create µ ˆ a vector of N(ˆ,   2 ) c.d.f. values on the sample ’all’ (it is a required input in Matlab KS test µ ˆ function): CDFall=normcdf(all,mean(all),std(all,1)); Then we run Matlab ’kstest’ function [H,P,KSSTAT,CV] = kstest(all,[all,CDFall],0.05) which outputs H = 0, P = 0.6502, KSSTAT = 0.0639, CV = 0.1178. We accept H0 since the p-value is 0.6502. ’CV’ is a critical value such that H0 is rejected if statistic ’KSSTAT’>’CV’. Remark. KS test is designed to test a simple hypothesis P =...