Exercise 4: Solutions 5th part

Load the data in R.

dat <- read.csv("NHANES1.csv")

# make factors
integer_info <- sapply(dat, is.integer)
integer_info[which(names(integer_info) == "age")] <- FALSE  # age should stay an integer
dat[integer_info] <- lapply(dat[integer_info], as.factor)

Task 4.1

Use a chi-square test in order to test whether the presence of chronic bronchitis and the current smoking status are independent.

table(dat$cbronch_now, dat$smokstat)

       
         1  2  3
  FALSE 37  8 28
  TRUE  32  3 43

chisq.test(dat$cbronch_now, dat$smokstat)


    Pearson's Chi-squared test

data:  dat$cbronch_now and dat$smokstat
X-squared = 5.6447, df = 2, p-value = 0.05947

Task 4.2

Use a Fisher test to verify the independence between sex and the presence of any kind of liver disease.

table(dat$livdis_now, dat$male)

       
        FALSE TRUE
  FALSE    33   38
  TRUE     50   60

fisher.test(dat$livdis_now, dat$male)


    Fisher's Exact Test for Count Data

data:  dat$livdis_now and dat$male
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.547707 1.979023
sample estimates:
odds ratio 
  1.041893

Task 4.3

Perform a sign test both on hdl and on log-hdl to test the hypothesis that the median of the cholesterol level is 1.30. Is the median significantly different from 1.30? Do you obtain the same results using hdl and logHdl?

library(BSDA)

Loading required package: lattice


Attaching package: 'BSDA'

The following object is masked from 'package:datasets':

    Orange

SIGN.test(dat$hdl, md = 1.30)


    One-sample Sign-Test

data:  dat$hdl
s = 2180, p-value = 0.3286
alternative hypothesis: true median is not equal to 1.3
95 percent confidence interval:
 1.29 1.32
sample estimates:
median of x 
       1.29 

Achieved and Interpolated Confidence Intervals: 

                  Conf.Level L.E.pt U.E.pt
Lower Achieved CI     0.9475   1.29   1.32
Interpolated CI       0.9500   1.29   1.32
Upper Achieved CI     0.9511   1.29   1.32

SIGN.test(log(dat$hdl), md = log(1.30))


    One-sample Sign-Test

data:  log(dat$hdl)
s = 2180, p-value = 0.3286
alternative hypothesis: true median is not equal to 0.2623643
95 percent confidence interval:
 0.2546422 0.2776317
sample estimates:
median of x 
  0.2546422 

Achieved and Interpolated Confidence Intervals: 

                  Conf.Level L.E.pt U.E.pt
Lower Achieved CI     0.9475 0.2546 0.2776
Interpolated CI       0.9500 0.2546 0.2776
Upper Achieved CI     0.9511 0.2546 0.2776

Task 4.4

Use a Mann-Whitney test to test the null hypothesis \(H_0 : male weight = female weight\).

library(coin)

Loading required package: survival

wilcox_test(weight ~ as.factor(male), data = dat)


    Asymptotic Wilcoxon-Mann-Whitney Test

data:  weight by as.factor(male) (FALSE, TRUE)
Z = -18.398, p-value < 2.2e-16
alternative hypothesis: true mu is not equal to 0

Task 4.5

It has been shown that there is a ‘social gradient’ in health such that the richer you are, the more likely you are to have better health. Plot general self-rated health against relative income so that you can get an impression whether this is confirmed by our data. Which kind of plot is reasonable? Consider using a mosaic plot. E.g. function `mosaicplot()`.

mosaicplot(srhgnrl ~ increl,main = 'self-rated health vs. relative income', xlab = 'self-rated health', ylab = 'relative income', data = dat)

Task 4.6

Test the relation for statistical significance using an appropriate test.

chisq.test(dat$srhgnrl, dat$increl)


    Pearson's Chi-squared test

data:  dat$srhgnrl and dat$increl
X-squared = 335.89, df = 12, p-value < 2.2e-16

Task 4.7

Categorize the variable `bmi` into an underweight (BMI<18.5), normal weight (18.5≤BMI<25), overweight (25≤BMI<30) and obese (BMI≥30) group. Turn the variable into a factor.

line

Task 4.8

What is the proportion of overweight or obese people according to the categorized BMI? What is the proportion of people ever diagnosed with being overweight (variable `ovrwght_ever`)? How many overweight people were actually ever diagnosed with being overweight?

bmi_cat <- NULL
bmi_cat[dat$bmi<18.5] <- "Underweight"
bmi_cat[dat$bmi>=18.5&dat$bmi<25] <- "Normal weight"
bmi_cat[dat$bmi>=25&dat$bmi<30] <- "Overweight"
bmi_cat[dat$bmi>=30] <- "Obese"
bmi_cat <- as.factor(bmi_cat)
dat$bmi_cat <- bmi_cat
prop.table(table(dat$ovrwght_ever))


    FALSE      TRUE 
0.6848109 0.3151891

prop.table(table(dat$ovrwght_ever, dat$bmi_cat), margin = 2)

       
        Normal weight     Obese Overweight Underweight
  FALSE     0.9671907 0.3321033  0.7632275   1.0000000
  TRUE      0.0328093 0.6678967  0.2367725   0.0000000

Task 4.9

Is there a difference in diabetes prevalence between obese people diagnosed with overweight and those who were never diagnosed? What about self-rated health? How do you explain the results?

(tab <- table(dat$diab_lft[dat$bmi_cat=="Obese"], dat$ovrwght_ever[bmi_cat=="Obese"]))

   
    FALSE TRUE
  1   473  694
  2     0    7
  3    48  253

fisher.test(tab)


    Fisher's Exact Test for Count Data

data:  tab
p-value < 2.2e-16
alternative hypothesis: two.sided

(tab <- table(dat$srhgnrl[bmi_cat=="Obese"], dat$ovrwght_ever[bmi_cat=="Obese"]))

   
    FALSE TRUE
  1    39   34
  2    98  188
  3   237  435
  4    98  283
  5     9   65

prop.table(tab, margin=1)

   
        FALSE      TRUE
  1 0.5342466 0.4657534
  2 0.3426573 0.6573427
  3 0.3526786 0.6473214
  4 0.2572178 0.7427822
  5 0.1216216 0.8783784

# Chi-square trend test
chisq.test(tab)


    Pearson's Chi-squared test

data:  tab
X-squared = 39.326, df = 4, p-value = 5.966e-08

Load the data in R.

Task 4.1

Use a chi-square test in order to test whether the presence of chronic bronchitis and the current smoking status are independent.

Task 4.2

Use a Fisher test to verify the independence between sex and the presence of any kind of liver disease.

Task 4.3

Perform a sign test both on hdl and on log-hdl to test the hypothesis that the median of the cholesterol level is 1.30. Is the median significantly different from 1.30? Do you obtain the same results using hdl and logHdl?

Task 4.4

Use a Mann-Whitney test to test the null hypothesis \(H_0 : male weight = female weight\).

Task 4.5

Task 4.6

Test the relation for statistical significance using an appropriate test.

Task 4.7

Categorize the variable bmi into an underweight (BMI<18.5), normal weight (18.5≤BMI<25), overweight (25≤BMI<30) and obese (BMI≥30) group. Turn the variable into a factor.

Task 4.8

What is the proportion of overweight or obese people according to the categorized BMI? What is the proportion of people ever diagnosed with being overweight (variable ovrwght_ever)? How many overweight people were actually ever diagnosed with being overweight?

Task 4.9

Is there a difference in diabetes prevalence between obese people diagnosed with overweight and those who were never diagnosed? What about self-rated health? How do you explain the results?

Categorize the variable `bmi` into an underweight (BMI<18.5), normal weight (18.5≤BMI<25), overweight (25≤BMI<30) and obese (BMI≥30) group. Turn the variable into a factor.

What is the proportion of overweight or obese people according to the categorized BMI? What is the proportion of people ever diagnosed with being overweight (variable `ovrwght_ever`)? How many overweight people were actually ever diagnosed with being overweight?