代写FIT3154 Assignment 2代做R编程

项目预算：开发周期：发布时间：要求地区：

FIT3154 Assignment 2

Due Date: 11:55PM AEST, Friday 24/10/2025

Introduction

There are a total of two questions worth 16 + 21 = 37 marks in this assignment. Please note that working and/or justification must be shown for all questions that require it.

This assignment is worth a total of 20% of your final mark, subject to hurdles and any other matters (e.g., late penalties, special consideration, etc.) as specified in the FIT3154 Unit Guide or elsewhere in the FIT3154 Moodle site (including Faculty of I.T. and Monash University policies).

Students are reminded of the Academic Integrity Awareness Training Tutorial Activity and, in par-ticular, of Monash University’s policies on academic integrity. In submitting this assignment, you acknowledge your awareness of Monash University’s policies on academic integrity and that work is done and submitted in accordance with these policies.

Submission Instructions: Please follow these submission instructions:

1. No files are to be submitted via e-mail. Submissions are to be made via Moodle.

2. You may not use generative A.I. in when answering this assignment.

3. Please submit:

(a) One PDF file containing non-code answers to all the questions that require written answers. This file should also include all your plots, and must be clearly written and formatted. Please ensure that your assignment answers the questions in the order specified in the assignment.

(b) A single R file containing R code answers. Please use comments to clearly indicate which bits of code are answering which questions.

Please read these submission instructions carefully and take care to submit the correct files in the correct places. GLHF!

Question 1 (16 marks)

In this question we will again examine Bayesian inference of the negative binomial distribution, but this time using a different prior distribution. Recall that the negative binomial distribution is a distribution used to model counts, and has the probability mass function

where µ > 0 is the mean parameter of the distribution and r > 0 is the shape parameter (note that r can be a non-integer) and Γ(·) is the gamma function. In Assignment 1 we used a gamma prior distribution. This time we will use a special heavy-tailed prior distribution which we will call the squared half-Cauchy; this has the probability density

where s > 0 is a scale-parameter that can be used to encode our prior information. If a RV X follows a squared half-Cauchy with scale s, we say X ∼ C 2+(0, s). Our hierarchy is then

where s can be thought of as setting the value of the “best guess” of µ before we see the data. For reference, the analysis in Assignment 1 used a gamma prior distribution with a choice of hyperparam-eters that encoded a prior best guess for µ of 14.5. The posterior mean was E [µ | y] ≈ 15.95 and the 95% credible interval was ≈ (15.06, 16.87).

Questions

Download the file nb.fit.shc.R. This contains the function, nb.sample.shc(y,s,n.samples) that draws n.samples from the posterior distribution for the hierarchy (3-5), where y is the data and s is the hyperparameter for the squared half-Cauchy prior. Use this to generate 100, 000 samples of µ and r for the COVID recovery data in covid.ass1.2025.csv.

1. Plot the histogram of the samples, and report on the posterior mean and the 95% credible intervals; compare these to the posterior mean and intervals found using the gamma prior from Assignment 1 (summarised above). [3 marks]

The second part of this question will look at finding the maximum a posteriori (MAP) estimator for µ, which is found by maximising the posterior, or equivalently minimising the negative log-posterior, i.e,.

where we will assume r is known or given. Please answer the following questions:

2. Write down the formula for the negative log-posterior, i.e., − log p(y | µ, r)π(µ|s), up to constants not dependent on µ. Make sure to simplify the expression. [2 marks]

3. Prove that maximising the posterior for µ is equivalent to solving

where [2 marks]

4. Write a function nb.map(y,s,r) which takes a vector of data y, the prior hyperparameter s and shape r and returns the MAP estimator by solving the quadratic (7) for the solution that minimises the negative log-posterior. (hint: refer to Studio 9 for tips) [1 marks]

5. Using this function and the data in covid.ass.2025.csv, compute the MAP estimator using s = 14 (our prior best guess); for the value of r, use the posterior mean of r obtained from the samples you generated as part of Q1.1. Compare this estimate to the posterior mean using the squared half-Cauchy found in Question 1.1. Discuss why you think they are similar/different. [1 mark]

6. Using your function nb.map(y,s,phi), explore the sensitivity of the MAP estimator for our COVID recovery data when using choices of prior best guess that are quite different from our guess based on the SARS recovery time. In particular, try the values

s = exp(seq(log(0.01),log(1e4),length.out=100))

Plot the MAP estimates found using these values of s against the values of s. How do these estimates compare to using s = 14? What does this say about the squared half-Cauchy prior?

(hint: use the log-scale for the x-axis of your plot) [2 marks]

7. Find the asymptotic formulae for the MAP estimate of µ, i.e., the solution to (7) when (i) s → 0, and (ii) s → ∞. Comment on these formulae. [2 marks]

As discussed in Assignment 1, it is sometimes more natural to use the alternative parameterisation v = log µ for the negative binomial distribution.

8. Transform. the squared half-Cauchy prior (2) on µ to a prior on v using the transformation of random variables technique from Lecture 5. [2 marks]

9. Plot this prior distribution for the choice of hyperparameters s = 10, s = 102 and s = 103 . How does s affect the prior on v? [1 mark]

Question 2: Negative Binomial Regression (21 marks)

In this question we are going to examine an extension of the negative binomial distribution to a regression setting. Let y = (y1, . . . , yn) be n non-negative integer observations that we would like to model, and let xi = (xi,1, . . . , xi,p) be a vector of p predictors associated with the i-th data point. Negative binomial regression models extend the usual negative binomial distribution in the following way:

where β = (β1, . . . , βp) are the coefficients relating the predictors to the target, β0 is the intercept and the negative binomial distribution has probability mass function given by (1). In this way the negative binomial regression model extends the regular linear regression model to modelling non-negative integer data. The effects of the predictors are additive on the log scale, and therefore multiplicative on the original scale.

Questions: Part I

The first part of this question will examine using the negative binomial regression model in the context of generalized additive model (GAM), using the mgcv package. Download the file daily.cases.2025. This file contains the daily recorded COVID-19 case numbers for a 10 week period starting on Monday, 6th of June. The target, Cases, contains the daily case numbers, the predictor Day is the number of days since 6th of June, and Day.Of.Week is a categorical variable indicating what day of the week each day in the dataset is.

1. Using the gam() function fit a negative binomial regression using maximum likelihood to the daily case data, using Cases as the target (hint: use the family=nb(link=log,theta=NULL) option to fit a negative binomial). We expect a nonlinear trend over time, so please model the Day predictor as a smooth function (hint: see Studio 8). We are also interested in seeing if there is an effect on reported daily case numbers based on the day of the week, so include the Day.Of.Week predictor in your model. Use summary() to inspect the fitted model; from these statistics please answer the following:

(a) Do you think there is evidence for a non-linear trend over time? Do you think there is evidence for an effect based on the day of the week? Justify these answers. [2 marks]

(b) how does being a Saturday affect case numbers, and how does being a Wednesday affect case numbers? [2 marks]

2. We could fit an additive model using an identity link instead of a log-link, i.e., using

and clipping any µi < 0 to µi = 0. Fit a GAM using gam() with an identity link using the option family=nb(link=identity,theta=NULL). Using summary() and the statistics produced by gam(), compare this model to the GAM using a negative binomial distribution with log-link fitted above. Which one do you think fits the data better and why? [1 mark]

3. On the same plot, please plot the observed daily case numbers against Day, and then plot: (i) the predictions for each day based on your negative binomial GAM using the log-link, and (ii) the predictions for each day based on your negative binomial GAM using the identy link. Discuss the difference between the two predictions. Does the log-link seem to have an advantage at modelling the series, and if so, why? [2 marks]

Questions: Part II

There is no simple closed-form. formula for the maximum likelihood estimates for a negative bino-mial regression. The second part will require you to write code to fit a negative binomial regression using maximum likelihood and gradient descent. To get you started I have provided a script. called nbreg.gd.R which contains several functions: nbreg.gd(), my.standardise() and df2matrix().

The main function nbreg.gd() is a skeleton that takes a y and X, standardises the predictors (as per Lecture 3) using my.standardise() and then has space for you to fill in with your gradient descent code. Once your code has finished running, the function unstandardises the coefficients back to the original scale. The function called df2matrix() takes a data frame. and a formula and returns an appropriate matrix of predictors X and vector of targets y. Answer the following questions:

4. Write down the expression for the negative log-likelihood of data y = (y1, . . . , yn) under the negative binomial regression model (8) with predictor matrix X of size n × p, coefficient vector β, intercept β0 and shape r. [2 marks]

5. Find the expressions for the gradient of the negative log-likelihood with respect to the intercept β0 and the coefficients β1, . . . , βp. [3 marks]

6. Write an R function that takes a vector of data y, a matrix of predictors, X, an intercept beta0, a vector of coefficients beta and a shape r and returns: (i) the negative log-likelihood and (ii) the gradients with respect to β0 and the coefficients β1, . . . , βp (hint: the lgamma(x) function implements log Γ(x)) [2 marks]

7. Implement gradient descent to find the maximum likelihood estimates of the intercept β0 and coefficients β1, . . . , βp, given a shape r. Use the function nbreg.gd.R as basis to get started. The function takes as arguments

• An X matrix of predictors and y vector of targets

• A value for the shape r

• An initial value for the learning rate κ.

• The size of the largest absolute value of the gradient used for termination (max.grad).

The function does some initialisation of the estimates of β0 and β for you by using least-squares onto the log-transformed targets. This are good starting points for a search. You will need to write the code to find the maximum likelihood estimates βˆ 0 and βˆ 1, . . . , βˆ p for the X and y that were passed using the gradient descent algorithm. Once it has finished, it should return a list containing:

• The ML estimate of the intercept β0.

• The ML estimates of the coefficients β1, . . . , βp

• The negative log-likelihood at your final estimates

• The gradient at your final estimates

• The negative log-likelihood recorded at every iteration of your gradient descent loop.

Some further details

• Your learning rate, κ, must be adaptive (hint: see Lecture 8).

• Your algorithm should terminate when the largest absolute value of the gradients is less than the max.grad argument passed to your function.

• When writing your code, you should consider computational efficiency and ensure that it is cleanly written and commented.

To help you test your code I have provided a test dataset in the file nbreg.test.csv. This has three predictors and a target (y). Use r=2; if your code is working, then it should return maximum likelihood estimates close to 0.99, 1.94 and −3.00 for the three predictors and an intercept close to 2.03. [4 marks]

8. Now, let us test our code properly. We can build our own additive model by including polynomial terms in our model. Fit a negative binomial regression to the data from Question Q2.1, including both Day.Of.Week and ten polynomial terms for Day (i.e., I(Day) + I(Day2) + ...). You can use the estimate of r found in Q2.1 as the value of r when using your code (access this using fit.gam$family$getTheta(F)).

Report the fitted coefficients, and produce predictions for all the days for your fitted negative binomial regression model using your maximum likelihood estimates of the coefficients and the definition of the negative binomial regression model (8). Plot these along with the observed daily case numbers. How similar do they look to the predictions made by the GAM? [2 marks]

9. Imagine we wanted to to use ridge regression instead of maximum likelihood to fit our nega-tive binomial regression model. How would the objective function and the gradients change to accomodate a ridge penalty on the coefficients β? [1 mark]

At the end of this assignment, you should reflect on how much you have learned since you started this unit. You now have learned how to use the Bayesian approach to statistical inference, you have ideas on how to analyse non-Gaussian data using regression models, and have even managed to write a program to fit such a model using maximum likelihood!

软件开发、广告设计客服