Statistical Analysis of Data Using SPSS

Statistical Analysis of Data Using SPSS

Introduction and dataset
The aim of this coursework is to investigate and predict the onset of diabetes based on
various diagnostic measurements.

The dataset was originally compiled by researcher at the Johns Hopkins University
School of Medicine, from a larger database owned by the National Institute of Diabetes
and Digestive and Kidney Diseases. All patients were females at least 21 years old of
Pima Indian heritage. Note that Pima Indians have one of the highest rates of diabetes
in the world.

This dataset includes 392 observations, taken at the individual level and available from
diabetes_dataset.xlsx file in Statistical Data Analysis Coursework folder on NOW.
The key indicator of diabetes (response variable), as defined by the World Health
Organization, is a plasma glucose concentration greater than 200 mg/dl two hours
following ingestion of a 75 gm carbohydrate solution (variable Glucose).

Glucose Pregnancies BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction Age Outcome
56 2 56 28 45 24.2 0.332 22 0
68 2 62 13 15 20.1 0.257 23 0
68 2 70 32 66 25 0.187 25 0
68 10 106 23 49 35.5 0.285 47 0
71 1 48 18 76 20.4 0.323 22 0
71 1 78 50 45 33.2 0.422 21 0
74 0 52 10 36 27.8 0.269 22 0
74 3 68 28 45 29.7 0.293 23 0
74 8 70 40 49 35.3 0.705 39 0
75 2 64 24 55 29.7 0.37 33 0
77 1 56 30 56 33.3 1.251 24 0
77 5 82 41 42 35.8 0.156 35 0
78 3 50 32 88 31 0.248 26 1
78 0 88 29 40 36.9 0.434 21 0
79 1 80 25 37 25.4 0.583 22 0
79 1 60 42 48 43.5 0.678 23 0
80 1 74 11 60 30 0.527 22 0
80 3 82 31 70 34.2 1.292 27 1
81 1 72 18 40 26.6 0.283 24 0
81 3 86 16 66 27.5 0.306 22 0
81 2 72 15 76 30.1 0.547 25 0
81 1 74 41 57 46.3 1.096 32 0
81 7 78 40 48 46.7 0.261 42 0
82 1 64 13 95 21.2 0.415 23 0
82 2 52 22 115 28.5 1.699 25 0
83 7 78 26 71 29.3 0.767 36 0
83 2 66 23 50 32.2 0.497 22 0
83 3 58 31 18 34.3 0.336 25 0
83 2 65 28 66 36.8 0.629 24 0
84 2 50 23 76 30.4 0.968 21 0
84 3 68 30 106 31.9 0.591 25 0
84 0 64 22 66 35.8 0.545 21 0
84 1 64 23 115 36.9 0.471 28 0
84 0 82 31 125 38.2 0.233 23 0
84 4 90 23 56 39.5 0.159 25 0
85 4 58 22 49 27.8 0.306 28 0
86 5 68 28 71 30.2 0.364 24 0
86 1 66 52 65 41.3 0.917 29 0
87 2 58 16 52 32.7 0.166 25 0
87 1 78 27 32 34.6 0.101 22 0
87 1 60 37 75 37.2 0.509 22 0
87 1 68 34 77 37.6 0.401 24 0
88 5 66 21 23 24.4 0.342 30 0
88 3 58 11 54 24.8 0.267 22 0
88 2 58 26 16 28.4 0.766 22 0
88 2 74 19 53 29 0.229 22 0
88 1 62 24 44 29.9 0.422 23 0
88 1 78 29 76 32 0.365 29 0
88 12 74 40 54 35.3 0.378 48 0
88 1 30 42 99 55 0.496 26 1
89 1 24 19 25 27.8 0.559 21 0
89 1 66 23 94 28.1 0.167 21 0
89 3 74 16 85 30.4 0.551 38 0
89 1 76 34 37 31.2 0.192 23 0
90 2 80 14 55 24.4 0.249 24 0
90 1 62 18 59 25.1 1.268 25 0
90 1 62 12 43 27.2 0.58 24 0
90 4 88 47 54 37.7 0.362 29 0
91 1 54 25 100 25.2 0.234 23 0
91 4 70 32 88 33.1 0.446 22 0
91 0 68 32 210 39.9 0.381 25 0
92 1 62 25 41 19.5 0.482 25 0
92 12 62 7 258 27.6 0.926 44 1
92 6 62 32 126 32 0.085 46 0
93 0 60 25 92 28.7 0.532 22 0
93 6 50 30 64 28.7 0.356 23 0
93 2 64 32 160 38 0.674 23 1
93 0 100 39 72 43.4 1.021 35 0
94 2 68 18 76 26 0.561 21 0
94 2 76 18 66 31.6 0.649 23 0
94 7 64 25 79 33.3 0.738 41 0
94 0 70 27 115 43.5 0.347 21 0
95 1 66 13 38 19.6 0.334 25 0
95 1 60 18 58 23.9 0.26 22 0
95 1 74 21 73 25.9 0.673 36 0
95 2 54 14 88 26.1 0.748 22 0
95 1 82 25 180 35 0.233 43 1
95 0 80 45 92 36.5 0.33 26 0
95 0 85 25 36 37.4 0.247 24 1
95 0 64 39 105 44.6 0.366 22 0
96 4 56 17 49 20.8 0.34 26 0
96 2 68 13 49 21.1 0.647 26 0
96 3 56 34 115 24.7 0.944 39 0
96 1 64 27 87 33.2 0.289 21 0
96 5 74 18 67 33.6 0.997 43 0
97 1 64 19 82 18.2 0.299 21 0
97 1 66 15 140 23.2 0.487 22 0
97 0 64 36 100 36.8 0.6 25 0
97 7 76 32 91 40.9 0.871 32 1
98 0 82 15 84 25.2 0.299 22 0
98 6 58 33 190 34 0.43 43 0
98 2 60 17 120 34.7 0.198 22 0
99 3 80 11 64 19.3 0.284 30 0
99 2 70 16 44 20.4 0.235 27 0
99 3 62 19 74 21.8 0.279 26 0
99 4 76 15 51 23.2 0.223 21 0
99 2 52 15 94 24.6 0.637 21 0
99 3 54 19 86 25.6 0.154 24 0
99 6 60 19 54 26.9 0.497 32 0
99 5 54 28 83 34 0.499 30 0
99 2 60 17 160 36.6 0.453 21 0
99 1 72 30 18 38.6 0.412 21 0
100 1 74 12 46 19.5 0.149 28 0
100 1 66 15 56 23.6 0.666 26 0
100 1 72 12 70 25.3 0.658 28 0
100 12 84 33 105 30 0.488 46 0
100 0 70 26 50 30.8 0.597 21 0
100 3 68 23 81 31.6 0.949 28 0
100 1 66 29 196 32 0.444 42 0
100 2 66 20 90 32.9 0.867 28 1
100 14 78 25 184 36.6 0.412 46 1
100 2 54 28 105 37.8 0.498 24 0
100 2 68 25 71 38.5 0.324 26 0
100 8 74 40 215 39.4 0.661 43 1
100 2 70 52 57 40.5 0.677 25 0
100 0 88 60 110 46.8 0.962 31 0
101 2 58 35 90 21.8 0.155 22 0
101 2 58 17 265 24.2 0.614 23 0
101 1 50 15 36 24.2 0.526 26 0
101 10 76 48 180 32.9 0.171 63 0
102 0 86 17 105 29.3 0.695 27 0
102 3 44 20 94 30.8 0.4 26 0
102 0 78 40 90 34.5 0.238 24 0
102 7 74 40 105 37.2 0.204 45 0
102 0 64 46 78 40.6 0.496 21 0
102 2 86 36 120 45.5 0.127 23 1
103 1 80 11 82 19.4 0.491 22 0
103 4 60 33 192 24 0.966 33 0
103 3 72 30 152 27.6 0.73 27 0
103 6 72 32 190 37.7 0.324 55 0
103 1 30 38 83 43.3 0.183 33 0
104 0 64 23 116 27.8 0.454 23 0
104 6 74 18 156 29.9 0.722 41 1
104 0 64 37 64 33.6 0.51 22 1
105 6 70 32 68 30.8 0.122 37 0
105 2 80 45 191 33.7 0.711 29 1
105 2 58 40 94 34.9 0.225 25 0
105 5 72 29 325 36.9 0.159 28 0
105 0 64 41 142 41.5 0.173 22 0
106 2 56 27 165 29 0.426 22 0
106 2 64 35 119 30.5 1.4 34 0
106 3 54 21 158 30.9 0.292 24 0
106 1 70 28 135 34.2 0.142 22 0
106 0 70 37 148 39.4 0.605 22 0
107 3 62 13 48 22.9 0.678 23 1
107 1 72 30 82 30.8 0.821 24 0
107 2 74 30 100 33.6 0.404 23 0
107 0 62 30 74 36.6 0.757 25 1
108 6 44 20 130 24 0.813 35 0
108 2 62 32 56 25.2 0.128 21 0
108 2 62 10 278 25.3 0.881 22 0
108 2 52 26 63 32.5 0.318 22 0
108 1 60 46 178 35.5 0.415 24 0
108 5 72 43 75 36.1 0.263 33 0
109 1 38 18 120 23.1 0.407 26 0
109 1 56 21 135 25.2 0.833 23 0
109 1 60 8 182 25.4 0.947 21 0
109 8 76 39 114 27.9 0.64 31 1
109 1 58 18 116 28.5 0.219 22 0
109 4 64 44 99 34.8 0.905 26 1
109 5 62 41 129 35.8 0.514 25 1
110 4 76 20 100 28.4 0.118 27 0
110 2 74 29 125 32.4 0.698 27 0
111 1 62 13 182 24 0.138 23 0
111 3 90 12 78 28.4 0.495 29 0
111 3 58 31 44 29.5 0.43 22 0
111 4 72 47 207 37.1 1.39 56 1
112 2 68 22 94 34.1 0.315 26 0
112 9 82 32 175 34.2 0.26 36 1
112 1 72 30 176 34.4 0.528 25 0
112 1 80 45 132 34.8 0.217 24 0
112 2 86 42 160 38.4 0.246 28 0
112 2 78 50 140 39.4 0.175 24 0
113 3 50 10 85 29.5 0.626 25 0
114 7 76 17 110 23.8 0.466 31 0
114 1 66 36 200 38.1 0.289 21 0
114 0 80 34 285 44.2 0.167 27 0
115 1 70 30 96 34.6 0.529 32 1
115 3 66 39 140 38.1 0.15 28 0
116 4 72 12 87 22.1 0.463 37 0
116 3 74 15 105 26.3 0.107 24 0
116 1 78 29 180 36.1 0.496 25 0
117 2 90 19 71 25.2 0.313 21 0
117 0 66 31 188 30.8 0.493 22 0
117 4 64 27 120 33.2 0.23 24 0
117 1 60 23 106 33.8 0.466 27 0
117 1 88 24 145 34.5 0.403 40 1
117 5 86 30 105 39.1 0.251 42 0
117 0 80 31 53 45.2 0.089 24 0
118 1 58 36 94 33.3 0.261 23 0
118 0 84 47 230 45.8 0.551 31 1
119 1 54 13 50 22.3 0.205 24 0
119 6 50 22 176 27.1 1.318 33 1
119 0 64 18 92 34.9 0.725 23 0
119 1 44 47 63 35.5 0.28 25 0
119 1 88 41 170 45.3 0.507 26 0
119 1 86 39 220 45.6 0.808 29 1
120 9 72 22 56 20.8 0.733 48 0
120 0 74 18 63 30.5 0.285 26 0
120 1 80 48 200 38.9 1.162 41 0
120 2 76 37 105 39.7 0.215 29 0
120 11 80 37 150 42.3 0.785 48 1
120 3 70 30 135 42.9 0.452 30 0
121 5 72 23 112 26.2 0.245 30 0
121 0 66 30 165 34.3 0.203 33 1
121 1 78 39 74 39 0.261 28 0
121 2 70 32 95 39.1 0.886 23 0
122 2 60 18 106 29.8 0.717 22 0
122 1 64 32 156 35.1 0.692 30 1
122 2 76 27 200 35.9 0.483 26 0
122 2 52 43 158 36.2 0.816 28 0
122 1 90 51 220 49.7 0.325 31 1
123 4 80 15 176 32 0.443 34 0
123 9 70 44 94 33.1 0.374 40 0
123 6 72 45 230 33.6 0.733 34 0
123 5 74 40 77 34.1 0.269 28 0
123 2 48 32 165 42.1 0.52 26 0
123 3 100 35 240 57.3 0.88 22 0
124 0 56 13 105 21.8 0.452 21 0
124 7 70 33 215 25.5 0.161 37 0
124 8 76 24 600 28.7 0.687 52 1
124 2 68 28 205 32.9 0.875 30 1
124 3 80 33 130 33.2 0.305 26 0
124 9 70 33 402 35.4 0.282 34 0
125 1 70 24 110 24.3 0.221 25 0
125 4 70 18 122 28.9 1.144 45 1
125 6 68 30 120 30 0.464 32 0
125 10 70 26 115 31.1 0.205 41 1
125 1 50 40 167 33.3 0.962 28 1
125 2 60 20 140 33.8 0.088 31 0
126 8 74 38 75 25.9 0.162 39 0
126 0 86 27 120 27.4 0.515 21 0
126 1 56 29 152 28.7 0.801 21 0
126 5 78 27 22 29.6 0.439 40 0
126 0 84 29 215 30.7 0.52 24 0
126 8 88 36 108 38.5 0.349 49 0
126 3 88 41 235 39.3 0.704 27 0
127 2 58 24 275 27.7 1.6 25 0
127 2 46 21 335 34.4 0.176 22 0
127 4 88 11 155 34.5 0.598 28 0
127 0 80 37 210 36.3 0.804 23 0
128 1 82 17 183 27.5 0.115 22 0
128 0 68 19 180 30.5 1.391 25 1
128 1 98 41 58 32 1.321 33 1
128 3 72 25 190 32.4 0.549 27 1
128 1 88 39 110 36.5 1.057 37 1
128 1 48 45 194 40.5 0.613 24 1
128 2 78 37 182 43.3 1.224 31 1
129 6 90 7 326 19.6 0.582 60 0
129 3 64 29 115 26.4 0.219 28 1
129 4 60 12 231 27.5 0.527 31 0
129 2 74 26 205 33.2 0.591 25 0
129 4 86 20 270 35.1 0.231 23 0
129 10 76 28 122 35.9 0.28 39 0
129 3 92 49 155 36.4 0.968 32 1
129 7 68 49 125 38.5 0.439 43 1
129 0 110 46 130 67.1 0.319 26 1
130 1 70 13 105 25.9 0.472 22 0
130 3 78 23 79 28.4 0.323 34 1
130 1 60 23 170 28.6 0.692 21 0
131 1 64 14 415 23.7 0.389 21 0
131 4 68 21 166 33.1 0.16 28 0
133 7 88 15 155 32.4 0.262 37 0
133 1 102 28 140 32.8 0.234 45 1
134 9 74 33 60 25.9 0.46 81 0
134 0 58 20 291 26.4 0.352 21 0
134 6 70 23 130 35.4 0.542 29 1
134 6 80 37 370 46.2 0.238 46 1
135 0 94 46 145 40.6 0.284 26 0
135 0 68 42 250 42.3 0.365 24 1
136 7 74 26 135 26 0.647 51 0
136 11 84 35 130 28.3 0.26 42 1
136 5 84 41 88 35 0.286 35 1
136 15 70 32 110 37.1 0.153 43 1
136 1 74 50 204 37.4 0.399 24 0
137 0 68 14 148 24.8 0.143 21 0
137 0 40 35 168 43.1 2.288 33 1
138 0 60 35 167 34.6 0.534 21 1
138 11 74 26 144 36.1 0.557 50 1
139 0 62 17 210 22.1 0.207 21 0
139 5 64 35 140 28.6 0.411 26 0
139 1 46 19 83 28.7 0.654 22 0
139 5 80 35 160 31.6 0.361 25 1
139 1 62 41 480 40.7 0.536 21 0
140 1 74 26 180 24.1 0.828 23 0
140 12 82 43 325 39.2 0.528 58 1
140 0 65 26 130 42.6 0.431 24 1
141 2 58 34 128 25.4 0.699 24 0
142 2 82 18 64 24.7 0.761 21 0
142 7 60 33 190 28.8 0.687 61 0
142 7 90 24 480 30.4 0.128 43 1
143 1 74 22 61 26.2 0.256 21 0
143 1 86 30 330 30.1 0.892 23 0
143 11 94 33 146 36.6 0.254 51 1
143 1 84 23 310 42.4 1.076 22 0
144 4 58 28 140 29.5 0.287 37 0
144 2 58 33 135 31.6 0.422 25 1
144 5 82 26 285 32 0.452 58 1
144 6 72 27 228 33.9 0.255 40 0
144 1 82 46 180 46.1 0.335 46 1
145 13 82 19 110 22.2 0.245 57 0
145 9 88 34 165 30.3 0.771 53 1
145 9 80 46 130 37.9 0.637 40 1
146 2 70 38 360 28 0.337 29 1
146 4 85 27 100 28.9 0.189 27 0
146 2 76 35 194 38.2 0.329 29 0
147 4 74 25 293 34.9 0.385 30 0
148 4 60 27 318 30.9 0.15 29 1
148 10 84 48 237 37.6 1.001 51 1
149 1 68 29 127 29.3 0.349 42 1
150 7 66 42 342 34.7 0.718 42 0
150 7 78 29 126 35.2 0.692 54 1
151 6 62 31 120 35.5 0.692 28 0
151 12 70 40 271 41.8 0.742 38 1
151 8 78 32 210 42.9 0.516 36 1
152 13 90 33 29 26.8 0.731 43 1
152 9 78 34 171 34.2 0.893 33 1
152 0 82 39 272 41.5 0.27 27 0
153 1 82 42 485 40.6 0.687 23 0
153 13 88 37 140 40.6 1.174 39 0
154 6 74 32 193 29.3 0.839 39 0
154 9 78 30 100 30.9 0.164 45 0
154 4 72 29 126 31.3 0.338 37 0
154 4 62 31 284 32.8 0.237 23 0
154 6 78 41 140 46.1 0.571 27 0
155 2 74 17 96 26.6 0.433 27 1
155 11 76 28 150 33.3 1.353 51 1
155 8 62 26 495 34 0.543 46 1
155 2 52 27 540 38.7 0.24 25 1
155 5 84 44 545 38.7 0.619 34 0
156 9 86 28 155 34.3 1.189 42 1
157 1 72 21 168 25.6 0.123 24 0
157 2 74 35 440 39.4 0.134 30 0
158 3 64 13 387 31.2 0.295 24 0
158 3 76 36 245 31.6 0.851 28 1
158 3 70 30 328 35.5 0.344 35 1
158 5 84 41 210 39.4 0.395 29 1
160 7 54 32 175 30.5 0.588 39 1
161 10 68 23 132 25.5 0.326 47 1
162 0 76 56 100 53.2 0.759 25 1
163 3 70 18 105 31.6 0.268 28 1
163 17 72 41 114 40.9 0.817 47 1
164 1 82 43 67 32.8 0.341 50 0
165 6 68 26 168 33.6 0.631 49 0
165 0 76 43 255 47.9 0.259 26 0
165 0 90 33 680 52.3 0.427 23 0
166 5 72 19 175 25.8 0.587 51 1
167 1 74 17 144 23.4 0.447 33 1
167 8 106 46 231 37.6 0.165 43 1
168 7 88 42 321 38.2 0.787 40 1
169 3 74 19 125 29.9 0.268 31 1
170 3 64 37 225 34.5 0.356 30 1
171 3 72 33 135 33.3 0.199 24 1
171 9 110 24 240 45.4 0.721 54 1
172 1 68 49 579 42.4 0.702 28 1
173 4 70 14 168 29.7 0.361 33 1
173 3 78 39 185 33.8 0.97 31 1
173 3 84 33 474 35.7 0.258 22 1
173 3 82 48 465 38.4 2.137 25 1
173 0 78 32 265 46.5 1.159 58 0
174 3 58 22 194 32.9 0.593 36 1
174 2 88 37 120 44.5 0.646 24 1
176 3 86 27 156 33.3 1.154 52 1
176 8 90 34 300 33.7 0.467 58 1
177 0 60 29 478 34.6 1.072 21 1
179 8 72 42 130 32.7 0.719 36 1
179 0 50 36 159 37.8 0.455 22 1
180 3 64 25 70 34 0.271 26 0
180 0 90 26 90 36.5 0.314 35 1
180 0 78 63 14 59.4 2.42 25 1
181 8 68 36 495 30.1 0.615 60 1
181 1 64 30 180 34.1 0.328 38 1
181 7 84 21 192 35.9 0.586 51 1
181 1 78 42 293 40 1.258 22 1
181 0 88 44 510 43.3 0.222 26 1
184 4 78 39 277 37 0.264 31 1
186 8 90 35 225 34.5 0.423 37 1
187 7 50 33 392 33.9 0.826 34 1
187 3 70 22 200 36.4 0.408 36 1
187 7 68 39 304 37.7 0.254 41 1
187 5 76 27 207 43.6 1.034 53 1
188 0 82 14 185 32 0.682 22 1
189 1 60 23 846 30.1 0.398 59 1
189 5 64 33 325 31.2 0.583 29 1
191 3 68 15 130 30.9 0.299 34 0
193 1 50 16 375 25.9 0.655 24 0
195 7 70 33 145 25.1 0.163 55 1
196 1 76 36 249 36.5 0.875 29 1
196 8 76 29 280 37.5 0.605 57 1
197 2 70 45 543 30.5 0.158 53 1
197 4 70 39 744 36.7 2.329 31 0
198 0 66 32 274 41.3 0.502 28 1

From the tabulated figures:

(1) Generate two random numbers between 2 and 7 and provide SPSS output.
(1 mark)

(2) Using SPSS, erase columns corresponding to your generated numbers (e.g. if
one of the generated numbers is 5 then erase column C5, etc). Describe how you did
this and provide the sequence of actions (e.g. Calc->Descriptive Stats->….)
(2 mark)

(3) Using SPSS select a random sample of 300 observations (n = 300) from your
dataset. Provide the sequence of actions of how you did this.
(1 mark)
Your unique dataset will now consist of 300 rows and seven columns including
Glucose, Age and Outcome.
Investigating your unique dataset

(4) For your unique dataset summarise information about your observations and present
graphically the frequency distributions for all variables that are left in your unique
dataset including Glucose but excluding Outcome variables. Comment on unusual
observations and make your own decision, how to deal with them.
(6 marks)

(5) Using SPSS, define a new variable, Age_Group, by combining observations
for participants younger than 30 into group 1 and all others (of age 30 and older) into
group 2. Provide either a description or a screen shot of how you did this.
(3 marks)

(6) Investigate whether there is a significant difference in mean/median Glucose
concentration between age groups. Formulate the null and alternative hypotheses;
choose, justify and perform an appropriate statistical test using SPSS; provide all
SPSS outputs; write your conclusions.
(10 marks)

(7) Show whether the proportion of participants with Glucose concentration greater
than 100 mg/dl is different between age groups that you defined previously. Formulate
the null and alternative hypotheses; choose, justify and perform an appropriate
statistical test using SPSS; provide all SPSS outputs; write your conclusions.
(10 marks)

(8) Using SPSS, produce a table of correlation coefficients. Justify the choice of
correlation coefficient, investigate the resulting table and comment on most interesting
relationships between chosen variables. Do not use Glucose and Outcome variables in
this analysis.
(4 marks)

(9) Using simple linear regression, model Glucose concentration by one of the
variables of your choice that are available in your unique dataset. Comment on
significance of intercept and slope.
(4 marks)

(10) Fit a multiple regression model with Glucose being a response variable and other
five variables excluding Outcome as predictors. Treat variable Pregnancies as an
interval scale data. Identify insignificant predictors in the model and explain why they
are insignificant.
(4 marks)

(11) Cluster your 300 observation into 10 groups using one of the linkage method and
similarity measure from the corresponding drop-down menus. Give a brief (half a page)
description of the linkage method and similarity measure chosen. Show a dendrogram
with cases labelled by Outcome. Comment on the results obtained. Provide all
SPSS outputs.
(6 marks)

(12) It is known that the incidence of diabetes in the UK is 0.6. In a small northern
village of 100 people isolated from the mainland for six months per year the pharmacy
wants to know how many insulin shots to order. We want to know what is the
probability that between A and B people will develop the disease during this period. To
perform analysis, generate two random numbers between 0 and 100 using SPSS
and paste the outputs into your report. Denote by A the smallest number and by B the
largest number out of these two generated numbers. Calculate the probability that
between A and B people develop the disease and how many shots should be ordered.
(9 marks)