Chapter 9 Linear Regression: Interpretation and Prediction

PRELIMINARY AND INCOMPLETE

9.1 Case Study: Predicting Home Price in Windsor’s Prime Locale

Introduction:

Windsor Estates, a prominent real estate agency, often receives queries from prospective homeowners about the price of houses based on specific features. The agency decided to delve deeper into historical sales data to predict home prices more accurately using multiple attributes.

Objective:

  • Interpret the coefficients of a comprehensive regression model built from historical housing data.
  • Predict the price of a home based on a hypothetical scenario, employing the derived model.

Step 1: Loading the Data and Setting up the Environment

# Assuming Ecdat is already installed
pacman::p_load(Ecdat)
data("Housing")

# Load necessary libraries
pacman::p_load(olsrr)

Step 2: Fit the Comprehensive Multiple Linear Regression Model

# Fit a model using all predictors
comprehensive_model <- lm(price ~ ., data=Housing)
model_summary <- summary(comprehensive_model)
model_summary
## 
## Call:
## lm(formula = price ~ ., data = Housing)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41389  -9307   -591   7353  74875 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4038.3504  3409.4713  -1.184 0.236762    
## lotsize         3.5463     0.3503  10.124  < 2e-16 ***
## bedrooms     1832.0035  1047.0002   1.750 0.080733 .  
## bathrms     14335.5585  1489.9209   9.622  < 2e-16 ***
## stories      6556.9457   925.2899   7.086 4.37e-12 ***
## drivewayyes  6687.7789  2045.2458   3.270 0.001145 ** 
## recroomyes   4511.2838  1899.9577   2.374 0.017929 *  
## fullbaseyes  5452.3855  1588.0239   3.433 0.000642 ***
## gashwyes    12831.4063  3217.5971   3.988 7.60e-05 ***
## aircoyes    12632.8904  1555.0211   8.124 3.15e-15 ***
## garagepl     4244.8290   840.5442   5.050 6.07e-07 ***
## prefareayes  9369.5132  1669.0907   5.614 3.19e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15420 on 534 degrees of freedom
## Multiple R-squared:  0.6731, Adjusted R-squared:  0.6664 
## F-statistic: 99.97 on 11 and 534 DF,  p-value: < 2.2e-16

Interpretation of Coefficients:

  • Intercept: Represents the base price of a house when all predictors are at their default or reference levels.

  • lotsize: Every unit increase in lot size (in sqft) is associated with the corresponding coefficient value’s increase in price, holding other variables constant.

  • bathrms, driveway, recroom, fullbase, gashw, airco, garagepl, prefarea: These categorical variables indicate the additional amount by which the house price changes (either positive or negative) if the feature is present compared to if it’s absent, with other factors kept unchanged.

9.1.1 Step 3: Predicting the Price for a Hypothetical House

Scenario: A family is keen on buying a house in Windsor. They desire a house that has:

  • Lot size: 5,000 sqft
  • 2 Bathrooms
  • A driveway
  • No recreational room
  • A full basement
  • Gas hot water heating
  • No air conditioning
  • A single garage place
  • Located in a preferred neighborhood

Let’s predict the price for such a house using our regression model:

new_house <- data.frame(
  lotsize = 5000,
  bathrms = 2,
  bedrooms = 3,
  stories = 1,
  driveway = "yes",
  recroom = "no",
  fullbase = "yes",
  gashw = "yes",
  airco = "no",
  garagepl = 1,
  prefarea = "yes"
)

predicted_price <- predict(comprehensive_model, newdata = new_house)
predicted_price
##        1 
## 93003.15