Problem Set 6
AEM 4110/5111 – Introduction to Econometrics
Fall 2025
Note: Part II (seasonality) is optional
Instructions
• This problem set is due by 12/10 at 11:59pm.
• Submit your answers via Canvas in the assignments section of the course.
• Submit a zipped folder with the following documents. The zipped folder should be named according to “PS6” “lastname”
1. A write-up in PDF format with your answers to the questions below and the full names of all your group members.
2. A do-file with the Stata code you use for your answers. In the do-file, comment your script specifying which sections correspond to each answer in your write-up.
3. For the questions that require filling a table, you can create one in Excel or using LaTeX.
4. Important! Please write each answer on a separate page and clearly label it with the corresponding question number (for example, Question I.1.a, Question I.1.b, etc.).
Set Up
The Goal In this problem set, you will apply time series methods to analyze U.S. Consumer Price Index (CPI) data. Specifically, you will work with month-over-month CPI growth rates from January 1980 to March 2024. The dataset has been deseasonalized to focus on the core autoregressive and moving average structure; you will examine the original seasonal patterns in Part 2.
Data: The file cpi data.dta contains the following variables:
• cpi deseason: Deseasonalized month-over-month CPI growth rate (in percentage points)
• cpi raw: Original (non-deseasonalized) month-over-month CPI growth rate
• date: Date variable (monthly, from 1980m1 to 2024m3)
• t: Time index (1, 2, 3, ..., 531)
The cpi raw series comes from the Federal Reserve Bank of St. Louis website FRED.
Part I: Model Selection
In this part, you will work exclusively with the deseasonalized CPI growth series (cpi deseason).
1 Let’s start with some data exploration
a) Load the dataset and declare it as time series data using the tsset command.
b) Create a time series plot of cpi deseason using the tsline command. Include this graph in your submission.
c) Briefly describe the visual properties of the series. Does it appear to fluctuate around a constant mean? Are there any obvious trends or structural breaks?
2 Next, we want to make sure that the series is stationary.
a) Conduct an Augmented Dickey-Fuller (ADF) test for cpi deseason using the dfuller com- mand as we saw in class.
b) Report the test statistic, and the 1%, 5%, and 10% critical values.
c) State the null and alternative hypotheses for the ADF test. Based on your results, do you reject or fail to reject the null hypothesis at the 5% significance level?
d) Interpret your conclusion in plain language: is the series stationary or non-stationary? Why do we want a stationary series?
3 Let’s now use the ACF and PACF to help us guide the choice of the model.
a) Generate and display the autocorrelation function (ACF) for the first 24 lags using the ac command. Include the graph in your submission.
b) Examine the ACF plot. Which lags have autocorrelations that are statistically significant (i.e., outside the confidence bands)? Describe the overall pattern (e.g., does it decay gradually or cut off sharply?).
c) Generate and display the partial autocorrelation function (PACF) for the first 24 lags using the pac command. Include the graph in your submission.
d) Examine the PACF plot. Which lags have partial autocorrelations that are statistically significant? Describe the pattern.
4 We are now going to use the command arima to estimate different ARMA models, and use the Information Criteria to help us select the model among them that best fits the data.
Estimate the following six ARMA specifications for cpi deseason:
i) AR(1): arima cpi deseason, ar(1)
ii) AR(2): arima cpi deseason, ar(1/2)
iii) MA(1): arima cpi deseason, ma(1)
iv) MA(2): arima cpi deseason, ma(1/2)
v) ARMA(1,1): arima cpi deseason, ar(1) ma(1)
vi) ARMA(2,2): arima cpi deseason, ar(1/2) ma(1/2)
For each model:
a) Report the estimated coefficients for the AR and MA terms, along with their standard errors and p-values in a table for each model.
b) Briefly comment on the signs of the significant coefficients. Do they align with the patterns you observed in the ACF and PACF?
c) After each regression, use the estat ic post-estimation command to obtain the Akaike In- formation Criterion (AIC) and Bayesian Information Criterion (BIC) for each model.
d) After compiling the table for all 6 models: Which model is selected as “best” according to AIC? Which model is selected as “best” according to BIC? Do the two criteria agree?
Note:
• Run estat ic immediately after each arima estimation to capture the AIC and BIC values before proceeding to the next model.
• For part c), compile the AIC and BIC values for all six models into one comprehensive comparison table to facilitate model selection.
5 In class we discussed that the residual of a “good model” should not contain any information that helps us predict future values of the series. In other words, we want our residuals to be “White Noise”. Let’s verify that this is the case for the model you selected in Question 4 (use the BIC criterion if AIC and BIC disagree). For this model:
a) After estimation, predict the residuals using predict resid, residuals.
b) Plot the residuals to see their distribution. Use the command histogram resid, bins(40). Do the residuals appear approximately normally distributed? Is the distribution centered around 0?
c) [For AEM 5111 only] Generate the ACF and PACF of the residuals for the first 20 lags using ac resid, lags(20). Include the graph in your submission.
d) [For AEM 5111 only] Based on the ACF and PACF of the residuals, do you think that the residuals are ~ i.i.d. N(0,σ2 )? Why/why not? What does this suggest about model adequacy?
Optional Part II: Seasonality
In Part I, you worked with deseasonalized data to focus on the core ARMA structure. In this part, you will examine the original, non-deseasonalized CPI growth series (cpi raw) to understand why deseasonalization was necessary and how it can be performed.
1 Let’s start with some visual exploration of the series cpi raw.
a) Create a time series plot of cpi raw using tsline. Compare this visually to the plot of cpi deseason from Question 1 in Part I. Do you notice any recurring patterns or cycles in the raw series?
b) Generate the ACF for cpi raw with 24 lags. Include the graph in your submission.
c) Generate the PACF for cpi raw with 24 lags. Include the graph in your submission.
d) Looking at the ACF for cpi raw, at which lag(s) do you see particularly large spikes that were not present (or much smaller) in the deseasonalized data? What does this tell you about the seasonality of the data?
2 [AEM 5111 only] Let’s now see how we can deseasonalize the data. The most simple way to remove seasonality is to include month-of-year dummy variables in a regression and retain the residuals. This method explicitly estimates and removes the average “effect” of each calendar month. The residuals contain the variation in the series which is not explained by the monthly patterns.
a) First, create a month variable:
|
gen month = month(dofm(date))
|
This extracts the month (1=January, 2=February, . . . , 12=December) from the date.
b) Run the following regression:
This is equivalent to regressing cpi raw on 11 dummy variables and a constant.
c) Explain what each coefficient in this regression represents. For example, what does the coefficient on 2 .month (February) tell you?
d) Generate the residuals from this regression:
|
predict cpi_des , residuals
|
These residuals represent the deseasonalized series: they are what remains of cpi raw after removing the average effect of each month.
e) Plot the ACF and PACF of the cpi des variable. What do you notice? How does it compare to the graphs you obtained from the raw series?
Optional Questions
a) Why would it be a problem to estimate the ARMA model on the raw series, i.e., without accounting for seasonality?
b) In the previous question, we removed the predictable effect of each month on the cpi raw variable. Imagine instead that you are interested in removing the effect of each quarter of the year. How would you do it? Please describe how you would modify the above procedure.
Notes on Deasonalization Method The series cpi deseasonalized was obtained using a dif- ferent deseasonalization procedure called “ X-13ARIMA-SEATS”. This is a method developed by the U.S. Census Bureau and used by professionals. You can read more about it here.
Using dummies to deseasonalize the series is very simple, but a drawback is that it imposes that the effect of each month is constant over the years.