Chapter 9 Linear Regression: Interpretation and Prediction
PRELIMINARY AND INCOMPLETE
9.1 Case Study: Predicting Home Price in Windsor’s Prime Locale
Introduction:
Windsor Estates, a prominent real estate agency, often receives queries from prospective homeowners about the price of houses based on specific features. The agency decided to delve deeper into historical sales data to predict home prices more accurately using multiple attributes.
Objective:
- Interpret the coefficients of a comprehensive regression model built from historical housing data.
- Predict the price of a home based on a hypothetical scenario, employing the derived model.
Step 1: Loading the Data and Setting up the Environment
# Assuming Ecdat is already installed
pacman::p_load(Ecdat)
data("Housing")
# Load necessary libraries
pacman::p_load(olsrr)Step 2: Fit the Comprehensive Multiple Linear Regression Model
# Fit a model using all predictors
comprehensive_model <- lm(price ~ ., data=Housing)
model_summary <- summary(comprehensive_model)
model_summary##
## Call:
## lm(formula = price ~ ., data = Housing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41389 -9307 -591 7353 74875
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4038.3504 3409.4713 -1.184 0.236762
## lotsize 3.5463 0.3503 10.124 < 2e-16 ***
## bedrooms 1832.0035 1047.0002 1.750 0.080733 .
## bathrms 14335.5585 1489.9209 9.622 < 2e-16 ***
## stories 6556.9457 925.2899 7.086 4.37e-12 ***
## drivewayyes 6687.7789 2045.2458 3.270 0.001145 **
## recroomyes 4511.2838 1899.9577 2.374 0.017929 *
## fullbaseyes 5452.3855 1588.0239 3.433 0.000642 ***
## gashwyes 12831.4063 3217.5971 3.988 7.60e-05 ***
## aircoyes 12632.8904 1555.0211 8.124 3.15e-15 ***
## garagepl 4244.8290 840.5442 5.050 6.07e-07 ***
## prefareayes 9369.5132 1669.0907 5.614 3.19e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15420 on 534 degrees of freedom
## Multiple R-squared: 0.6731, Adjusted R-squared: 0.6664
## F-statistic: 99.97 on 11 and 534 DF, p-value: < 2.2e-16
Interpretation of Coefficients:
Intercept: Represents the base price of a house when all predictors are at their default or reference levels.
lotsize: Every unit increase in lot size (in sqft) is associated with the corresponding coefficient value’s increase in price, holding other variables constant.
bathrms, driveway, recroom, fullbase, gashw, airco, garagepl, prefarea: These categorical variables indicate the additional amount by which the house price changes (either positive or negative) if the feature is present compared to if it’s absent, with other factors kept unchanged.
9.1.1 Step 3: Predicting the Price for a Hypothetical House
Scenario: A family is keen on buying a house in Windsor. They desire a house that has:
- Lot size: 5,000 sqft
- 2 Bathrooms
- A driveway
- No recreational room
- A full basement
- Gas hot water heating
- No air conditioning
- A single garage place
- Located in a preferred neighborhood
Let’s predict the price for such a house using our regression model:
new_house <- data.frame(
lotsize = 5000,
bathrms = 2,
bedrooms = 3,
stories = 1,
driveway = "yes",
recroom = "no",
fullbase = "yes",
gashw = "yes",
airco = "no",
garagepl = 1,
prefarea = "yes"
)
predicted_price <- predict(comprehensive_model, newdata = new_house)
predicted_price## 1
## 93003.15