P8483
Application of Epidemiologic Research Methods
Homework 7
Due Monday March 31 at 11:59 pm to CourseWorks.
No late assignments accepted. Any assignment uploaded past the due date will not be graded.
NOTE: ONLY .SAS files will be accepted for upload.
Instructions for all homework assignments.
· Each student must submit their own SAS EDITOR FILE onto Courseworks by the deadline specified on the front page of the homework assignment.
· Please title your SAS EDITOR FILE as follows:
o Example: “MRL2013_HW7.sas”
· At the top of your SAS EDITOR file please put your first name, last name, and uni in a /*comment block*/
· Points will be deducted if the SAS code handed in does not run without errors from beginning to end.
Introduction
PLEASE USE THE SAS DATASET “Homework_7.sas7bdat”, available in RAW DATA/Homework on SAS On Demand.
For this assignment, you are interested in looking at the effect of sedentary behavior. on cognitive function in older adults using NHANES 2001-2002 data. You hypothesize that gender, education level, and body mass index are potential confounders of the sedentary behavior—cognitive function association. Two of the variable definitions (from NHANES) are below. Refer to the NHANES documentation for other definitions.
CFDRIGHT
Number correct (range 0 to 100) on a cognitive functioning score
DMDEDUC (educational attainment)
1 = less than high school, 2 = high school diploma (including GED), 3 = more than high school, 7 = refused, 9 = don’t know, . = missing
PAD480
Over the past 30 days, on a typical day how much time altogether did {you/SP} spend on a typical day sitting and watching TV or videos or using a computer outside of work?
0 = less than 1 hour
1 = 1 hour; 2 = 2 hours; 3 = 3 hours; 4 = 4 hours
5 = 5 hours or more
6 = none
77 = refused; 99 = don’t know; . = missing
Assignment
1. Create a copy of the dataset “homework_7” excluding patients with “refused”, “don’t know”, or “missing” responses to the educational attainment measure (variable DMDEDUC). Store it in the WORK library
· You should have 1,551 observations. Please let us know you confirmed this in the comments (and tell us how you confirmed this)).
· Produce a histogram of your outcome variable (CFDRIGHT) and comment on whether you think it is approximately normally distributed.
2. You decide to use the variable PAD480, which is a self-reported measure of time spent in front of the television or computer, to measure a construct of “sedentary behavior.” Create a categorical variable named SEDB that meets the following criteria:
SEDB = 0 if participant reports one hour or fewer of time spent on a typical day sitting and watching TV or videos or using a computer outside of work
SEDB = 1 if participant more than one hour of time spent on a typical day sitting and watching TV or videos or using a computer outside of work
Create and apply an appropriate format to this new variable. Check your work! Use PROC FREQ to confirm (with code!) that all PAD480 variable values were applied to the correct category of new variable SEDB, and no variable values were mistakenly missed.
3. Use PROC TTEST to report the mean difference (and APPROPRIATE 95% CI; check the “equality of variance” test to know which 95% CI to report) in cognitive functioning score (variable CFDRIGHT) between those with different levels of the binary variable SEDB you just created. Interpret the mean difference and 95% CI in a sentence. If you were using an arbitrary 2-sided alpha of 0.05 at a cutoff against which to declare “statistical significance,” what would you conclude?
4. Use PROC REG or PROC GLM to do the same thing. Interpret the mean difference and 95% CI in a sentence. If you were using an arbitrary 2-sided alpha of 0.05 at a cutoff against which to declare “statistical significance,” what would you conclude?
5. How much of the total variability in CFDRIGHT is explained by its linear relationship with binary variable sedentary behavior. (SEDB)? Report where you got this information from.
6. Compare your crude estimate from Question 4 with a “fully adjusted” measure of the association between SEDB and CFDRIGHT after adjusting for age (variable RIDAGEYR) and educational attainment (variable DMDEDUC, treated as an unordered categorical variable) as sources of potential confounding. In a short paragraph, compare the crude (and 95% CI) to the fully adjusted (and 95% CI), tell me whether age and educational attainment are confounding the association between SEDB and CFDRIGHT, and whether after adjustment for age and DMDEDUC is there still a relationship between SEDB and CFDRIGHT? Show your work and tell me where you got the answer. Be sure to report all relevant regression parameter estimates and 95% CIs around these parameter estimates.
7. How much of the total variability in CFDRIGHT is explained by its linear relationship with SEDB, ridageyr, and DMDEDUC? Report where you got this information from.
8. From a causal perspective, can you interpret the partial regression parameter estimate for variable “DMDEDUC” in the model you used in Question 6? If so, interpret it. If not, tell me why you can’t interpret it.
9. Repeat question 6 with a different operationalization for your sedentary behavior. variable. This time, create an ordinal categorical variable with the lowest order corresponding to a response of “none” to variable PAD480, then next-lowest corresponding to “less than one hour,” etc. all the way to the highest category of “5 hours or more.” Report and interpret the crude estimate and 95% CI, the adjusted estimate and 95% CI. What do you conclude about the association between this operationalization of sedentary behavior. and CFRRIGHT score after adjusting for age and education? Which model do you prefer (the adjusted Question 6 model or the adjusted Question 9 model)? Why?