#### Ames Housing Market Case Study Assignment
#Note that the include = FALSE above prevents the package loading code from appearing in your knitted document (while still allowing the code to execute)
In this assignment we will “put it all together” for linear regression by completing a small predictive analytics case study. The dataset we are using in this assignment comes from the “AmesHousing” package. This dataset includes housing sales data from Ames, Iowa from 2006 to 2010. The response variable in the dataset is the “Sale_Price” variable. This variable tells us the price at which each house in the dataset sold.
DELIVERABLE: A knitted Word document of all of your work on this assignment. Note: Please remove as much extraneous R code and output as you can. You can always do this by editing the knitted document directly in Word.
Read in the dataset with the following command.
ames = make_ames()
To significantly simplify the problem, we will select a subset of the variables. Use the code below to select these variables. Because we have the MASS package loaded, note the use of dplyr::select.
ames = ames %>% dplyr::select(Sale_Price, Lot_Area, Bldg_Type, House_Style, Overall_Cond, Year_Built, Exter_Cond, Heating_QC, Central_Air, Gr_Liv_Area, Full_Bath, Half_Bath, Bedroom_AbvGr, Paved_Drive, Fence)
Additionally, there are a few variables that have categories that are empty. We can drop these empty categories with the code below:
ames = ames %>% droplevels()
Note that there is no further data cleaning and preparation necessary as there is no missing data and factor conversions and recoding have already been done.
**Q1** Split the dataset into training and testing sets. Your training set should have 80% of the data with the remaining 20% in the testing set. All further work (until the last task) should take place using the training set.
**Q2** Plot each of the variables in the dataset versus the “Sale_Price” variable. Briefly comment on the relationship between each variable and “Sale_Price”. I strongly recommend that you use the “grid.arrange” function (as we have done in the lectures) to reduce the amount of space taken up by the plots.
**Q3** For the numeric variables, create a correlation matrix (Hint: The ggcorr function will automatically exclude any non-numeric variables when building the correlation plot). Which variables appear to be most strongly correlated with “Sale_Price”?
**Q4** Based on the plots and the correlation matrix, what variables appear to be strong candidates to be predictors of “Sale_Price”? Identify at least THREE variables that seem to be good candidates.
**Q5** Build three separate linear regression models with the three best variables that you identified above. These should be single predictor models (i.e., one variable different variable in each model to predict “Sale_Price”). Comment on the quality of these models.
**Q6** Now use backward stepwise regression to build a model. Your model should start with all of the possible predictors. You may find it useful to add the line of code “options(scipen = 999)” as a line of code before running your model. This will prevent the results from being displayed in somewhat unhelpful scientific notation. Be sure to look at your “allmod” output before proceeding. Note that the allmod output will be pretty long because of the large number of levels in the various categorical variables.
**Q7** From your backward stepwise model results, discuss which variables appear to be important predictors of “Sale_Price”. Recall that categorical variables are typically viewed as statistically significant if at least one of their levels (categories) is significant.
**Q8** You do not need to do any formal analysis, but describe how multicollinearity could be a concern in this model.
**Q9** Describe how this model might be used.
**Q10** What is the performance of this model (considering the adjusted R-squared value) on the training and testing sets?
Before submitting your work, please carefully read the required deliverable at the beginning of the document.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.Read more
Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.Read more
Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.Read more
By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.Read more