MTH303 Computer-Based Coursework
Announcement & Submission Rules
• You are required to complete the MTH303 coursework using the provided “WORD TEM- PLATE FOR THE MTH303 COURSEWORK”.
• Please submit a single PDF file exported from the template; the maximum file size is 5MB.
• This coursework must be completed independently. Academic integrity applies. You must name the file with your student number as follows: Studentnumber.pdf.
• How to present your work
– For every sub-question, include:
(i) Question number (e.g., “Task 2.3”).
(ii) R code (if asked) used to answer the question. A screenshot of the code works.
(iii) Outputs/plots/tables (if required) that the code produces.
(iv) Discussion/arguments (if asked).
– Keep the four parts together under the corresponding sub-question so the workflow is self-contained and easy to follow.
– Label figures/tables (e.g., “Fig. 2.3a”) and refer to them in your discussion.
• The deadline for submission of the coursework is Sunday 7th December at 11:59 PM.
Background
ABC Hospital is investigating factors that drive inpatient Length of Stay (LOS) and all- cause readmission. The analytics team extracted an encounter-level dataset and saved it as readmission .csv. Your role is to conduct a clear, defensible analysis and communicate insights for decision-making.
General Guidance
• Core tasks focus on multiple linear regression (MLR) for LOS and a logistic GLM for read- mission.
• You may want to use methods or variations beyond those shown in lectures if you believe they improve the analysis; ensure they are well-motivated and clearly explained. Notice that there is no requirement to go beyond the course coverage.
• Keep your work reproducible; figures and tables must be labelled and referenced in text.
Data Dictionary
|
Column
|
Allowed values / type
|
|
LOS
|
Integer days (1–30)
|
|
Readmission .Status
|
{0, 1}
|
|
Age
|
Integer years (18–95)
|
|
Gender
|
{F, M}
|
|
Race
|
{White, Black, Hispanic, Others}
|
|
ER
|
Non-negative integer count
|
|
HCC.Riskscore
|
Positive continuous
|
|
DRG.Class
|
{MED, SURG, UNGROUP}
|
|
DRG.Complication
|
{MedicalNoC, MedicalMCC.CC, SurgNoC, SurgMCC.CC, Other}
|
Variable descriptions
• LOS: Length of stay in days for the index admission.
• Readmission.Status: Binary outcome for all-cause readmission after discharge (1=yes, 0=no).
• Age: Patient age in years.
• Gender/Race: Recorded administrative categories.
• ER: Number of emergency-room visits prior to the index admission.
• HCC.Riskscore: A clinical risk severity score; larger values indicate sicker patients.
• DRG.Class : coarse clinical grouping of the case based on diagnosis and procedures. MED = medical (non-surgical) admissions; SURG = surgical cases; UNGROUP = spe- cial/uncategorised cases not falling cleanly into MED or SURG.
• DRG.Complication : severity flag within DRG. MedicalNoC/SurgNoC = no notable complications for medical/surgical cases; MedicalMCC.CC/SurgMCC.CC = has (ma- jor) complications/comorbidities; Other = miscellaneous/rare codes.
Part A: Multiple Linear Regression for LOS
Task 1 (10 pts): Visualisation & transformation
1.1 (5 pts) Plot a histogram of LOS and comment on its skewness. If skewed, choose a simple transformation, justify briefly, and use the transformed version for all modeling in Part A.
1.2 (5 pts) Choose one categorical predictor and draw a suitable plot of transformed LOS by groups; comment briefly.
Task 2: Modeling and checks
2.1 (4 pts) Baseline model (no interactions). Fit on the transformed response using Age + ER + HCC.Riskscore + Gender + Race + DRG.Class + DRG.Complication
and name the model as m0.
2.2 (6 pts) Using only summary of m0, drop variables that are not significant at the 5% level. Refit and report the summary of the reduced model m red. Discuss briefly whether the goodness-of-fit of m red is improved compared with m0.
2.3 (12 pts) Generate diagnostic plots on the reduced model m red and comment whether basic assumptions appear reasonable.
2.4 (12 pts) Detect the existence of any unusual data. Flag and list any outliers using |ri| > 3 for standardised residuals, any high leverage if hii > 4h, where h = (p + 1)/n, and any influential points using your choice of benchmark.
2.5 (4 pts) Assess multicollinearity for the reduced model m red with VIF and use 5 as the threshold. Comment on the potential impact of multicollinearity issue on inference.
2.6 (8 pts) Apply a selection method (e.g., AIC/BIC stepwise; your choice) on the baseline model m0 and name the selected model as m sel. Compare the selected model m sel with the reduced model m red from 2.2 using some appropriate criteria.
Part B: GLM for Readmission
3.1 (6 pts) Fit a baseline Generalized Linear Model, namely g0, with binomial family and logit link using
Age + ER + HCC.Riskscore + Gender + Race + DRG.Class + DRG.Complication Report the summary of g0.
3.2 (6 pts) Using summary only, remove variables not significant at the 10% level. Refit and report the updated model drop g. Compare the goodness-of-fit between g0 and drop g.
3.3 (10 pts) Choose appropriate residuals for the updated model and make residual plots to check and justify the appropriateness for the random component.
3.4 (6 pts) Detect outliers using the benchmark of 2.5, and drop these points (assume we could remove them directly). Refit the model on the reduced dataset and name it fin g.
3.5 (6 pts) Create a new observation at your preference (show the full data.frame) and compute the predicted probability.
Part C (10 pts): Brief summary
In 200 words or fewer, summarise what you did, what you found, and one implication for hospital practice. Mention one limitation.
Reminder: Submission Checklist
• Use the Word template; export a single PDF ≤ 5MB.
• Labelled figures/tables with references in text.
• Name the file with your student number, i.e., Studentnumber.pdf ( such as “12345678.pdf”).