Solved Statistics Exam Questions

Solved Statistics Exam Questions

Section A

  1. Let θ be an unknown  parameter and θˆ an estimator of θ based on data  with sample size

n.

(a)    Define the bias of the estimator θˆ.

(b)    Define the mean squared  error of the estimator θˆ. (c)   Define the standard error of the estimator θˆ.

(d)    Define what  it means for θˆ to be a consistent estimator of θ.[8]

  1. Based on a sample of data, a hypothesis  test is to be performed of the null hypothesis  that a parameter θ = θ0  versus an alternative hypothesis  that θ > θ0.  Let T denote  the  test statistic for the test,  with larger values of T corresponding  to supporting the alternative hypothesis.

(a)    Define mathematically the p-value corresponding  to the test. (b)    Explain  in words what  the p-value measures.

(c)    Show that when  the  null  hypothesis  is true,  considering  the  p-value  as a random variable,   its  cumulative distribution  function  is  that of the  continuous   uniform distribution on the unit  interval.

(d)    Hence explain  why a test  which  rejects  the  null  when  the  p-value  is less than  α

controls  the type 1 error rate  at level α.[8]

 

  1. Let X1, . . . , Xn  be  independent  and  identically   distributed N (µ, σ2),  with  µ  and  σ2 unknown.    A confidence  interval  (L, U )  for µ is to  be  constructed based  on  the  data X1, . . . , Xn.

(a)    Define what  it means for (L, U ) to be a 95% confidence interval  for µ.

(b)    State  the definition for a t-distribution on n − 1 degrees of freedom.

σ/√n

(c)    State  the  distribution of the  sample  mean  X,  and  use this  to  show that  X−µ   ∼

N (0, 1).

(d)    State  the function of the sample variance  S2  that is chi-squared  distributed on n − 1 degrees of freedom.

(e)    Hence derive an expression for a 95% confidence interval  for µ.[8]

Section B

  1. A student is performing a Monte-Carlo simulation  experiment using the software package R to investigate the coverage probability of a confidence interval  for a parameter θ. Their program generates N independent datasets using the true value of θ that they choose, and on each, the confidence interval  is calculated. Let (Li, Ui) denote  the confidence interval from the ith  simulation.  Let π denote  the confidence interval’s  true  coverage level.

(a)    What   is the  distribution of the  number  of simulations   for which  the  confidence interval  includes the true  parameter value θ?

(b)    Give an expression for an estimator πˆ  of π based on the simulation  experiment.

(c)    Derive the approximate distribution of πˆ  assuming  that the  number  of simulations

N  is large.

(d)    Assuming  N  is  large,  use  your  answer  to  part   (c)  to  derive  expressions  for  a symmetric  95% confidence interval  for π.

(e)    Assuming  that π ≈ 0.95, derive how large a value of N  should  be used to ensure that the 95% confidence interval  for π has width  0.05.[18]

  1. Let X1, . . . , Xn be independent and identically  distributed continuous  random  variables with  common probability density  function  f(x; θ) = θxθ−1  for 0 < x < 1 and  θ > 0 an unknown  parameter. It can be shown that

θ                                                                 θ

E(X1) = 1 + θ                              Var(X1) = (θ + 1)2(θ + 2)

1                                                      1

E(log(X1)) = −θ                          Var(log(X1)) = θ2

 

(a)    Prove  that f(x; θ) is indeed a valid probability density  function.

(b)   Derive the maximum  likelihood estimator of θ given X1, . . . , Xn.

(c)   Prove  that the maximum  likelihood estimator is consistent for θ.

(d)    Derive the form of the critical region of the most powerful test of the null hypothesis that θ = θ0  versus the alternative hypothesis  that θ = θ1, for θ1 > θ0. [18]

  1. Let X1, . . . , Xn be independent and identically distributed random  variables  each drawn from the binomial distribution with k > 1 trials  and success probability 0 ≤ π ≤ 1. Thus the probability mass function  of Xi, i = 1, . . . , n is

for x ∈ {0, 1, 2, . . . , k}.

 

P (Xi = x) =

k

πx

x

(1 − π)

k−x

(a)    Derive  an  expression  for the  maximum  likelihood  estimator of π  given  the  data

X1, . . . , Xn.

In an 1889 study of the human  sex ratio conducted  based on hospital  records in Germany, the  number  of boys among  6,115 families each of which had  12 children  was recorded. The following table  shows the distribution of number  of boys from the study.

 

No. boys              0    1      2        3        4        5          6          7          8        9        10      11    12

No. of families    3    24    104    286    670    1033    1343    1112    829    478    181    45    7

(b)    Assuming that the number  of boys in each family is an independent and identically distributed draw  from a binomial  distribution, calculate  the  maximum  likelihood estimate  of the probability π that each birth  is a boy.

(c)    Estimate the standard error of your estimate  of π.

(d)    Calculate Pearson’s  goodness of fit test  statistic using the data.

(e)    Use your  answer  to  (d)  to judge  whether  the  binomial  model  fits  the  data  well and  what  implications   your  finding  has  for  inference  about  π.    To  help  answer the  question  it  may  be useful to  know the  following quantiles  for the  chi-squared distributions on 10, 11, and 12 degrees of freedom.

χ2                             2

10,0.05  = 3.94   χ10,0.95 = 18.31

χ2                             2

11,0.05  = 4.57   χ11,0.95 = 19.68

χ2                             2

12,0.05  = 5.23   χ12,0.95 = 21.03

(f)    Suggest a reason for why you think  the binomial  model either  fits well or does not fit well, according to what  you found in part  (e).[18]

Section C

  1. A clinical trial is to be conducted to compare a new treatment with an existing treatment for patients recently  infected  with  human  immunodeficiency  virus  (HIV).  The  outcome of interest is CD4 count,  which is a measure  of activity  of the immune  system.  The aim of the new treatment is to increase CD4 count compared  to the existing treatment.

In 500  words or less,  describe how you would design the trial, what variables you would measure on the patients, and how you would perform the statistical analysis.  You should give reasons for the choices of design and analysis that you make.