Analytics Challenge: Store24
Instructions
Please complete the following problems and submit a file named challenge__store24A.R using the Store24A data (Store24A_data.xls) available on CourseSite.
Remember:
• Do not rename external data files or edit them in any way. In other words, don’t modify Store24A__data.xls. Your code won’t work properly on my version of that data set, if you do.
• Do not use global paths in your script. The directory structure of your machine is not the same as the one on Gradescope’s virtual machines, so it will won’t look in the right place. Data will always be stored in the base directory unless otherwise noted, so I suggest setting up your code to run like that. One way to do it is to use Rprojects or Rmarkdown files. You could also use setwd() interactively in the console, but do not forget to remove or comment out this part of the code before you submit.
• Do not destroy or overwrite any variables in your program. I check them only after I have run your entire program from start to finish.
• Check to make sure you do not have any syntax errors.
– Tip: before submitting, it might help to clear all the objects from your workspace, and then source your file before you submit it. This will often uncover bugs.
Packages
You have access to the following packages. Do not load any other packages or your code may not function. If feel you need another package, let me know why and I may add it.
library (dplyr)
library (tidyr)
library (ggplot2)
library (stringr)
library (readxl)
library (modelsummary)
Question 1
• Run a regression with profit as the dependent variable and manager tenure as the only independent variable
• I will check that there is an object called “lm_ManagerTenure” and that the fitted coefficient are correct
Question 2
• Run a regression with profit as the dependent variable and crew tenure as the only independent variable
• I will check that there is an object called “lm_CrewTenure” and that the fitted coefficient are correct
Question 3
• Run a regression with profit as the dependent variable and both manager and crew tenure as the only two independent variables
• I will check that there is an object called “lm_BothTenure” and that the fitted coefficient are correct
Question 4
• Run a regression with profit as the dependent variable and all control variables in the Store24A data (MTenure, CTenure, Pop, Comp, Visibility, PedCount, Res, Hours24)
• I will check that there is an object called “lm_AllControl” and that the fitted coefficient are correct
Question 5
• Run a regression with profit as the dependent variable and, in addition to all control variables in the Store24A data (MTenure, CTenure, Pop, Comp, Visibility, PedCount, Res, Hours24), include a squared term for manager tenure. Do not create a new variable for MTenureˆ2. Instead, add ‘I(MTenureˆ2)’ to your formula.
• I will check that there is an object called “lm_AllControl.M2” and that the fitted coefficient for ‘I(MTenureˆ2)’ is correct
Question 6
• Add all of your regressions to the same model summary table. You can use the code below. Feel free to tweak the parameters to add or remove information.
model_list <- list ( "lm_ManagerTenure" = lm_ManagerTenure,
"lm_CrewTenure" = lm_CrewTenure,
"lm_BothTenure" = lm_BothTenure,
"lm_AllControl" = lm_AllControl,
"lm_AllControl .M2" = lm_AllControl .M2
)
model_summary_table <-
modelsummary ::modelsummary (model_list,
estimate = "{estimate}{stars}" ,
statistic = "conf .int" ,
fmt=list( "estimate"=1 , "std .error"=1 , "r .squared"=3 , "fmt"=1 , "conf .int"=1 )
model_summary_table
• What variables add the most to the predictive power? Hint: Look at the R2 values. You may also want to run additional models with fewer or additional parameters.