Suppose that we have an i.i.d. sample X1 , . . . , Xn with some unknown distribution P and we would like to test the hypothesis that P is equal to a particular distribution P0 , i.e. decide between the following hypotheses: H0 : P = P 0 , H 1 : P = P 0 . ⇒ We already know how to test this hypothesis using chi-squared goodness-of-ﬁt test. If dis tribution P0 is continuous we had to group the data and consider a weaker discretized null hypothesis. We will now consider a diﬀerent test for H0 based on a very diﬀerent idea that avoids this discretization.

1 0.9 0.8 all data normal fit

1 0.9 0.8 men data ’men’ fit women data ’women’ fit

Cumulative probability

0.6 0.5 0.4 0.3 0.2 0.1 0 96 96.5 97 97.5 98 98.5 Data 99 99.5 100 100.5

Cumulative probability

0.7

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 96.5 97 97.5 98 98.5 Data 99 99.5 100 100.5

Figure 13.1: (a) Normal ﬁt to the entire sample. (b) Normal ﬁt to men and women separately.

Example.(KS test) Let us again look at the normal body temperature dataset. Let ’all’ be a vector of all 130 observations and ’men’ and ’women’ be vectors of length 65 each corresponding to men and women. First, we ﬁt normal distribution to the entire set ’all’. MLE µ and are ˆ ˆ 83

mean(all) = 98.2492, std(all,1) = 0.7304. We see in ﬁgure 13.1 (a) that this distribution ﬁts the data very well. Let us perfom KS test that the data comes from this distribution N(ˆ, 2 ). To run the test, ﬁrst, we have to create µ ˆ a vector of N(ˆ, 2 ) c.d.f. values on the sample ’all’ (it is a required input in Matlab KS test µ ˆ function): CDFall=normcdf(all,mean(all),std(all,1)); Then we run Matlab ’kstest’ function [H,P,KSSTAT,CV] = kstest(all,[all,CDFall],0.05) which outputs H = 0, P = 0.6502, KSSTAT = 0.0639, CV = 0.1178. We accept H0 since the p-value is 0.6502. ’CV’ is a critical value such that H0 is rejected if statistic ’KSSTAT’>’CV’. Remark. KS test is designed to test a simple hypothesis P =...