BUSS6002 Individual Assignment
Semester 2, 2025
Instructions
• Due: at 23:59 on Thursday, 30 October, 2025 (week 12).
• You must submit a written report (in PDF, under Canvas-Assignment-Individual Assign- ment (PDF)) with the following filename format, replacing STUDENTID with your own student ID: BUSS6002 STUDENTID. pdf.
• You must also submit a Jupyter Notebook file ( . ipynb, under Canvas-Assignment-Individual Assignment (ipynb)) with the following filename format, replacing STUDENTID with your own student ID: BUSS6002 STUDENTID . ipynb.
• There is a limit of 6 A4-pages for your report (including equations, tables, captions and reference).
• Your report should have an appropriate title (of your own choice).
• Do not include a cover page.
• All plots, computational tasks, and results must be completed using Python.
• Each section of your report must be clearly labelled with a heading.
• Do not include any Python code as part of your report.
• All figures must be appropriately sized and have readable axis labels and legends (where applicable).
• The submitted . ipynb file must contain all the code used in the development of your report.
• The submitted . ipynb file must be free of any errors, and the results must be reproducible.
• You may submit multiple times but only your last submission will be marked.
• A late penalty applies if you submit your assignment late without a successful special con- sideration (or simple extension). See the Unit Outline for more details.
• Late submission. In accordance with University policy, these penalties apply when written work is submitted after 11:59pm on the due date:
— Deduction of 5% of the maximum mark for each calendar day after the due date.
— After ten calendar days late, a mark of zero will be awarded.
• Generative AI tools (such as ChatGPT) could be used for this assignment but you should
add a statement at the end of your report specifying how generative AI was used. E.g., Generative AI was used only used for editing the final report text. See UoS outline section “Use of generative artificial intelligence (AI)” for detailed instructions.
Description
The Chicago Board Options Exchange (CBOE) Gold Exchange-Traded Fund (ETF) Volatility Index (GVZ) measures the markets expectations of near-term volatility in gold prices, derived from option prices on the Standard & Poors Depositary Receipts Gold Shares ETF. Predicting the GVZ is useful because it helps investors anticipate fluctuations in gold market volatility and manage portfolio risk more effectively. In this assignment, you are conducting a study that compares the predictive performance of four families of basis functions: piece-wise constant, piece-wise linear, radial, and Laplace, within a linear basis function (LBF) model designed to predict the GVZ index value. The objective is to determine which family of basis functions is most suitable for modelling the relationship between time and gold market volatility (measured by GVZ).
You are provided with the GVZ dataset, sourced from the Federal Reserve Economic Data, Federal Reserve Bank of St. Louis. The dataset contains daily observations of GVZ values (GVZ) from 2008 to 2025, along with the Year-Month and Month Index for which the values are recorded. You will be working with the Month Index (as the independent variable x in regression) and the GVZ (as the dependent/response variable y in regression). The actual (Year-Month) is also provided so that you can match the month index value with the actual month of related event in the history, such as the 2008 Global Financial Crisis, to facilitate your understanding of the economic implications of the GVZ index. A scatter plot of the dataset is shown in Figure 1.
Figure 1: GVZ levels from June 2008 to Sep 2025.
The specific LBF model being considered in your study is given by
y = φ(x)Tβ + ε,
where y is the GVZ index value, x is the month index, and ε is a random noise; φ(x) denotes the vector of basis function values; the parameter vector to be estimated is β . Four families of basis functions are considered for computing φ(x).
Piece-wise constant basis function
The first family is the set of piece-wise constant basis functions φ(x) := [1,γ1 (x),...,γk (x)]T , with
γi (x) := I(x > ti ),
where I(x > ti ) is an indicator function defined by
The break points {ti }i(k)=1 are calculated according to
(1)
where xmin and xmax denote the smallest and largest observed values of x, respectively.
Piece-wise linear basis function
The second family is the set of piece-wise linear basis functions φ(x) := [1, x,λ1 (x),...,λk (x)]T , with
λi (x) := (x - ti )I(x > ti ),
where ti is given by Equation (1).
Radial basis function
The third family is the set of radial basis functions φ(x) := [1,ρ1 (x),...,ρk (x)]T , with
where ti is given by Equation (1).
Laplace basis function
The final family is the set of Laplace basis functions φ(x) := [1,τ1 (x),...,τk (x)]T , with
where ti is given by Equation (1).
Before comparing the four basis function families, you must set the number of components k for all models. This hyperparameter value for each basis function family should be selected using a validation set, by minimising the validation mean squared error (MSE).
You should select the optimal values of k by exhaustively searching through an equally-spaced grid from 1 to 30, with a spacing of 1: K := {1, 2, . . . , 30}.
Once the optimal values of the hyperparameters are chosen for all basis function families, you will be able to compare the predictive performance between the four using a test set (i.e., by comparing the test MSE between the four optimally selected models).
With respect to the train-validation-test split, you should use the data points with month index 1-150 as the train set; 151-180 as the validation set; and 181-208 as the test set.
Report Structure
Your report must contain the following four sections. The number of pages for each section is indicative only and not compulsory.
Report Title
1 Introduction (approximately 0.5 pages)
— Provide a brief project background so that the reader of your report can understand the general problem that you are solving.
— Motivate your research question.
— State the aim of your project.
— Provide a short summary of each of the rest of the sections in your report (e.g., “The report proceeds as follows: Section 2 presents . . . . Section 3 shows”).
2 Methodology (approximately 2.5 pages)
— Define and describe the LBF model.
— Define and describe the four choices of basis function families being investigated.
— Describe how the parameter vector β is estimated given the value of the hyperparameter
k. Discuss any potential numerical issues associated with the estimation procedure.
— Describe how the hyperparameter value can be determined automatically from data (as opposed to manually setting the hyperparameter to an arbitrary value).
— Describe how the performance of the four families of basis functions is compared given the optimal hyperparameter value.
3 Empirical Study (approximately 2.5 pages)
— Describe the datasets used in your study and discuss your observations for the data. — Present (in a table) the selected hyperparameter value for each basis function family. — Describe and discuss the table of selected hyperparameters.
— Visually present (using plots) the predicted response values for each basis function family in the test set.
— Describe and discuss the plots of predicted values.
— Present (in a table) the test MSE values for each basis function family.
— Describe and discuss the table of test MSE values.
— Report the GVZ forecasts of Oct 2025, Nov 2025, Dec 2025, given by the model with the smallest test MSE. Include a brief description of how these forecasts are obtained. After completing the unit, at the end of 2025 you could compare your forecasts to the actual observations and see how accurate they are.
4 Conclusion (approximately 0.5 pages)
— Discuss your overall findings / insights.
— Discuss any limitations of your study.
— Suggest potential directions of extending your study.
Marking Rubric
This assignment is worth 30% of the unit’s marks. The assessment is designed to test your compu- tational skills in implementing algorithms and conducting empirical experiments, as well as your communication skills in writing a concise and coherent report presenting your approach and results. The mark allocation across assessment items is given in Table 1.
|
Assessment Item
|
Goal
|
Marks
|
|
Section 1
|
Introduction
|
4
|
|
Section 2
|
Methodology
|
10
|
|
Section 3
|
Empirical Study
|
16
|
|
Section 4
|
Conclusion
|
3
|
|
Overall Presentation
|
Clear, concise, coherent, and correct
|
5
|
|
Jupyter Notebook
|
Reproducable results
|
2
|
|
Total
|
|
40
|
Table 1: Assessment Items and Mark Allocation