**Solved Statistics Exam Questions **

**Section A**

- Let θ be an unknown parameter and θˆ an estimator of θ based on data with sample size

n.

(a) Define the bias of the estimator θˆ.

(b) Define the mean squared error of the estimator θˆ. (c) Define the standard error of the estimator θˆ.

(d) Define what it means for θˆ to be a consistent estimator of θ.[8]

- Based on a sample of data, a hypothesis test is to be performed of the null hypothesis that a parameter θ = θ0 versus an alternative hypothesis that θ > θ0. Let T denote the test statistic for the test, with larger values of T corresponding to supporting the alternative hypothesis.

(a) Define mathematically the p-value corresponding to the test. (b) Explain in words what the p-value measures.

(c) Show that when the null hypothesis is true, considering the p-value as a random variable, its cumulative distribution function is that of the continuous uniform distribution on the unit interval.

(d) Hence explain why a test which rejects the null when the p-value is less than α

controls the type 1 error rate at level α.[8]

- Let X1, . . . , Xn be independent and identically distributed N (µ, σ2), with µ and σ2 unknown. A confidence interval (L, U ) for µ is to be constructed based on the data X1, . . . , Xn.

(a) Define what it means for (L, U ) to be a 95% confidence interval for µ.

(b) State the definition for a t-distribution on n − 1 degrees of freedom.

(c) State the distribution of the sample mean X, and use this to show that __X____−µ__ ∼

N (0, 1).

(d) State the function of the sample variance S2 that is chi-squared distributed on n − 1 degrees of freedom.

(e) Hence derive an expression for a 95% confidence interval for µ.[8]

**Section B**

- A student is performing a Monte-Carlo simulation experiment using the software package R to investigate the coverage probability of a confidence interval for a parameter θ. Their program generates N independent datasets using the true value of θ that they choose, and on each, the confidence interval is calculated. Let (Li, Ui) denote the confidence interval from the ith simulation. Let π denote the confidence interval’s true coverage level.

(a) What is the distribution of the number of simulations for which the confidence interval includes the true parameter value θ?

(b) Give an expression for an estimator πˆ of π based on the simulation experiment.

(c) Derive the approximate distribution of πˆ assuming that the number of simulations

N is large.

(d) Assuming N is large, use your answer to part (c) to derive expressions for a symmetric 95% confidence interval for π.

(e) Assuming that π ≈ 0.95, derive how large a value of N should be used to ensure that the 95% confidence interval for π has width 0.05.[18]

- Let X1, . . . , Xn be independent and identically distributed continuous random variables with common probability density function f(x; θ) = θxθ−1 for 0 < x < 1 and θ > 0 an unknown parameter. It can be shown that

θ θ

E(X1) = 1 + θ Var(X1) = (θ + 1)2(θ + 2)

1 1

E(log(X1)) = −θ Var(log(X1)) = θ2

(a) Prove that f(x; θ) is indeed a valid probability density function.

(b) Derive the maximum likelihood estimator of θ given X1, . . . , Xn.

(c) Prove that the maximum likelihood estimator is consistent for θ.

(d) Derive the form of the critical region of the most powerful test of the null hypothesis that θ = θ0 versus the alternative hypothesis that θ = θ1, for θ1 > θ0. [18]

- Let X1, . . . , Xn be independent and identically distributed random variables each drawn from the binomial distribution with k > 1 trials and success probability 0 ≤ π ≤ 1. Thus the probability mass function of Xi, i = 1, . . . , n is

for x ∈ {0, 1, 2, . . . , k}.

P (Xi = x) =

k

πx

x

(1 − π)

k−x

(a) Derive an expression for the maximum likelihood estimator of π given the data

X1, . . . , Xn.

In an 1889 study of the human sex ratio conducted based on hospital records in Germany, the number of boys among 6,115 families each of which had 12 children was recorded. The following table shows the distribution of number of boys from the study.

No. boys 0 1 2 3 4 5 6 7 8 9 10 11 12

No. of families 3 24 104 286 670 1033 1343 1112 829 478 181 45 7

(b) Assuming that the number of boys in each family is an independent and identically distributed draw from a binomial distribution, calculate the maximum likelihood estimate of the probability π that each birth is a boy.

(c) Estimate the standard error of your estimate of π.

(d) Calculate Pearson’s goodness of fit test statistic using the data.

(e) Use your answer to (d) to judge whether the binomial model fits the data well and what implications your finding has for inference about π. To help answer the question it may be useful to know the following quantiles for the chi-squared distributions on 10, 11, and 12 degrees of freedom.

χ2 2

10,0.05 = 3.94 χ10,0.95 = 18.31

χ2 2

11,0.05 = 4.57 χ11,0.95 = 19.68

χ2 2

12,0.05 = 5.23 χ12,0.95 = 21.03

(f) Suggest a reason for why you think the binomial model either fits well or does not fit well, according to what you found in part (e).[18]

**Section C**

- A clinical trial is to be conducted to compare a new treatment with an existing treatment for patients recently infected with human immunodeficiency virus (HIV). The outcome of interest is CD4 count, which is a measure of activity of the immune system. The aim of the new treatment is to increase CD4 count compared to the existing treatment.

In 500 words or less, describe how you would design the trial, what variables you would measure on the patients, and how you would perform the statistical analysis. You should give reasons for the choices of design and analysis that you make.