11  Exploring Non-Linear Relationships

Curvature is common in business relationships: marketing spend shows diminishing returns, prices scale sublinearly with size, and operational outcomes vary nonlinearly over the day. This chapter shows how to model such patterns primarily with ordinary least squares (OLS) using transformations, low-order polynomials, and piecewise linear specifications. Advanced methods are briefly surveyed at the end.

11.1 Chapter Goals

Upon concluding this chapter, readers will be equipped with the skills to:

  1. Diagnose non-linear patterns using exploratory graphics (e.g., smoothers, binned scatterplots) and residual structure from baseline linear models.
  2. Select and justify functional forms—linear–log, log–linear, log–log, low-order polynomials, and piecewise (hinge) models—based on managerial meaning and data support.
  3. Estimate OLS models with transformations, polynomial terms, and simple hinges; compute and interpret derivatives, semi-elasticities, and elasticities in business terms.
  4. Locate and interpret turning points and slope changes, and translate these features into actionable managerial insights.
  5. Validate specifications with residual diagnostics and light cross-validation, while guarding against overfitting and extrapolation beyond observed ranges.
  6. Communicate non-linear effects clearly using prediction curves with confidence bands, and report impacts in meaningful units or percentages for decision-makers.
  7. Recognize when OLS-based forms are insufficient and articulate when a brief, conceptual comparison to advanced methods (e.g., GAMs, tree-based models) is warranted, without delving into implementation.

11.2 Datasets referenced in this chapter (open access)

  • Ames Housing (real estate pricing). Source: De Cock (2011), JSE; available via the AmesHousing package
  • Diamonds (retail pricing). Source: ggplot2::diamonds
  • Inside Airbnb (hospitality pricing). Source: insideairbnb.com
  • Advertising (marketing mix). Source: ISLR website (Advertising.csv)
  • Wage (compensation analytics). Source: ISLR2::Wage
  • mpg (automotive fuel economy). Source: ggplot2::mpg
  • Bike Sharing (demand vs temperature). Source: UCI Machine Learning Repository
  • NYC Taxi trips (transport pricing; sample recommended) and NYC TLC rate card
  • USPS Notice 123 (tiered shipping)
  • nycflights13 (operations; delays by hour). Source: nycflights13 package (CC0)
  • Solar PV learning curve (experience effects). Source: Our World in Data
NoteData access notes
  • AmesHousing: use AmesHousing::make_ames() to obtain a cleaned, single-table version
  • Diamonds/mpg: available in ggplot2 as ggplot2::diamonds and ggplot2::mpg
  • Advertising.csv: download from the ISLR site, then read with readr::read_csv("Advertising.csv")
  • Wage: available as ISLR2::Wage
  • Inside Airbnb: choose a city/date and download listings.csv.gz from the data portal
  • UCI Bike Sharing: download day.csv and hour.csv from the repository
  • NYC TLC trips: sample a manageable subset of monthly CSVs; pair with the official fare schedule
  • USPS Notice 123: build a compact weight–price table from the online price list
  • nycflights13: nycflights13::flights provides CC0 flight records
  • Our World in Data solar: download PV price and capacity series and join into a tidy table

12 Setup

13 Motivation: seeing non-linearity early

Anscombe’s reminder: identical summaries can mask very different shapes. Always visualize.

# Visual comparison of linear fit vs flexible smoother (power-law context)
ggplot(diamonds, aes(carat, price)) +
  geom_point(alpha = 0.05) +
  geom_smooth(method = "lm", se = FALSE) +
  geom_smooth(method = "loess", se = FALSE, linetype = "dashed") +
  labs(title = "Linear vs smoother: price ~ carat", subtitle = "Dashed line = loess")
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Figure 13.1

Residual patterns often reveal missed curvature.

m_lin <- lm(price ~ carat, data = diamonds)
aug <- tibble(
  fitted = fitted(m_lin),
  resid  = resid(m_lin)
)
ggplot(aug, aes(fitted, resid)) +
  geom_point(alpha = 0.1) +
  geom_smooth(se = FALSE) +
  labs(title = "Residuals vs fitted", subtitle = "Structure suggests nonlinearity")
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

14 Choosing functional forms

Start with managerial meaning (absolute vs percent changes), then prefer the simplest form that removes residual structure.

  • Linear: good baseline; interpret changes in levels
  • Linear–log: diminishing returns in inputs (1% change in X → level change in Y)
  • Log–linear: unit change in X → percent change in Y
  • Log–log: constant elasticity; percent change in X → percent change in Y
  • Polynomial (quadratic): smooth curvature; turning points
  • Piecewise linear (hinge): slope changes at business-relevant thresholds
NoteChecklist

Visualize → try simple transforms → consider low-order polynomials → justify knots for piecewise models → validate

14.1 Reference: forms, slopes, elasticities

Let \(Y\) be the outcome and \(X\) a single predictor. Derivatives and elasticities are evaluated at a given \(X\) with predicted \(Y\).

Model Specification Slope \(dY/dX\) Elasticity \(\frac{dY}{dX}\frac{X}{Y}\) Managerial interpretation
Linear \(Y=\beta_0+\beta_1 X+\varepsilon\) \(\beta_1\) \(\beta_1 \cdot X/Y\) One-unit increase in \(X\) changes \(Y\) by \(\beta_1\) units
Linear–log \(Y=\beta_0+\beta_1 \ln X+\varepsilon\) \(\beta_1/X\) \((\beta_1/X)\cdot X/Y=\beta_1/Y\) 1% increase in \(X\) changes \(Y\) by \(\beta_1/100\) units
Log–linear \(\ln Y=\beta_0+\beta_1 X+\varepsilon\) \(\beta_1 Y\) \(\beta_1 X\) One-unit increase in \(X\) changes \(Y\) by \(100\cdot \beta_1\%\)
Log–log \(\ln Y=\beta_0+\beta_1 \ln X+\varepsilon\) \(\beta_1 Y/X\) \(\beta_1\) Elasticity of \(Y\) w.r.t. \(X\) equals \(\beta_1\) (constant)
Quadratic \(Y=\beta_0+\beta_1 X+\beta_2 X^2+\varepsilon\) \(\beta_1+2\beta_2 X\) \((\beta_1+2\beta_2 X)\cdot X/Y\) Turning point at \(X^{\star} =-\beta_1/(2\beta_2)\) if \(\beta_2<0\)

15 Modeling non-linearity with OLS (concepts)

  • Transformations: linear, linear–log, log–linear, log–log
  • Low-order polynomials: quadratic, cubic; center/scale to reduce collinearity
  • Piecewise linear (hinge): add \(\max(0, X-k)\) for a knot \(k\); justify with business logic or policy thresholds

16 Examples and mini-applications

16.1 Constant-elasticity pricing (log–log)

Diamonds (retail pricing): price vs carat

Note

Source: ggplot2::diamonds

m_loglog_diamonds <- lm(log(price) ~ log(carat), data = diamonds)
summary(m_loglog_diamonds)  # elasticity = coef on log(carat)

Call:
lm(formula = log(price) ~ log(carat), data = diamonds)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.50833 -0.16951 -0.00591  0.16637  1.33793 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 8.448661   0.001365  6190.9   <2e-16 ***
log(carat)  1.675817   0.001934   866.6   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2627 on 53938 degrees of freedom
Multiple R-squared:  0.933, Adjusted R-squared:  0.933 
F-statistic: 7.51e+05 on 1 and 53938 DF,  p-value: < 2.2e-16

Ames Housing (real estate pricing): sale price vs living area

Note

Source: De Cock (2011), JSE; available via the AmesHousing package

ames <- AmesHousing::make_ames()
m_loglog_ames <- lm(log(Sale_Price) ~ log(Gr_Liv_Area), data = ames)
summary(m_loglog_ames)

Call:
lm(formula = log(Sale_Price) ~ log(Gr_Liv_Area), data = ames)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0778 -0.1465  0.0264  0.1740  0.8602 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       5.43019    0.11644   46.63   <2e-16 ***
log(Gr_Liv_Area)  0.90781    0.01602   56.66   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2816 on 2928 degrees of freedom
Multiple R-squared:  0.523, Adjusted R-squared:  0.5228 
F-statistic:  3210 on 1 and 2928 DF,  p-value: < 2.2e-16

16.2 Diminishing returns to spend (linear–log)

ISLR Advertising: Sales ~ log(TV) (extend to other channels as needed)

Note

Source: ISLR website (Advertising.csv)

adv <- read_csv("https://www.statlearning.com/s/Advertising.csv", show_col_types = FALSE)
New names:
• `` -> `...1`
m_linlog_adv <- lm(sales ~ log(TV), data = adv)
summary(m_linlog_adv)

Call:
lm(formula = sales ~ log(TV), data = adv)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.9318 -2.6777 -0.2758  2.1227  9.2654 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -4.2026     1.1623  -3.616  0.00038 ***
log(TV)       3.9009     0.2432  16.038  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.45 on 198 degrees of freedom
Multiple R-squared:  0.565, Adjusted R-squared:  0.5628 
F-statistic: 257.2 on 1 and 198 DF,  p-value: < 2.2e-16

16.3 Semi-elasticities in compensation (log–linear)

ISLR Wage: log(wage) ~ age + age² (optionally add tenure, education)

Note

Source: ISLR2::Wage

data(Wage, package = "ISLR2")
m_loglin_wage <- lm(log(wage) ~ age + I(age^2), data = Wage)
summary(m_loglin_wage)

Call:
lm(formula = log(wage) ~ age + I(age^2), data = Wage)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.72742 -0.19431  0.00655  0.18995  1.13251 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.468e+00  6.810e-02   50.93   <2e-16 ***
age          5.166e-02  3.232e-03   15.98   <2e-16 ***
I(age^2)    -5.202e-04  3.685e-05  -14.12   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3325 on 2997 degrees of freedom
Multiple R-squared:  0.1069,    Adjusted R-squared:  0.1063 
F-statistic: 179.3 on 2 and 2997 DF,  p-value: < 2.2e-16

16.4 Quadratic curvature and turning points

Automotive fuel economy: hwy ~ displ + displ²

Note

Source: ggplot2::mpg

m_quad_mpg <- lm(hwy ~ displ + I(displ^2), data = mpg)
tp_displ <- -coef(m_quad_mpg)["displ"] / (2 * coef(m_quad_mpg)["I(displ^2)"])
tp_displ
   displ 
5.367878 

Bike demand vs temperature (UCI Bike Sharing): cnt ~ temp + temp²
Download day.csv from the UCI repository and place at data/bike_day.csv

Note

Source: UCI Machine Learning Repository (Bike Sharing Dataset)

# bike_day <- read_csv("data/bike_day.csv")
# m_quad_bike <- lm(cnt ~ temp + I(temp^2), data = bike_day)
# summary(m_quad_bike)

16.5 Piecewise (hinge) models from pricing rules

NYC Taxi: total fare vs distance with base fee and per-mile segments. Use a sampled trips file (e.g., data/nyc_taxi_sample.csv) and the official rate card to justify the knot(s)

Note

Source: NYC Taxi & Limousine Commission trip records; NYC TLC rate card

# taxi <- read_csv("data/nyc_taxi_sample.csv")
# taxi <- taxi %>% mutate(after1 = pmax(trip_distance - 1, 0))
# m_hinge_taxi <- lm(total_amount ~ trip_distance + after1, data = taxi)
# summary(m_hinge_taxi)

USPS shipping: postage vs weight shows step/tier pricing (use hinges at ounce thresholds)

Note

Source: USPS Notice 123

# usps <- tribble(
#   ~weight_oz, ~price_usd,
#   1, 0.68,
#   2, 0.92,
#   3, 1.16,
#   3.5, 1.40
# )
# usps <- usps %>% mutate(after1 = pmax(weight_oz - 1, 0),
#                         after2 = pmax(weight_oz - 2, 0),
#                         after3 = pmax(weight_oz - 3, 0))
# m_hinge_usps <- lm(price_usd ~ weight_oz + after1 + after2 + after3, data = usps)
# summary(m_hinge_usps)

16.6 Smooth time-of-day curvature (splines for contrast)

Airline operations: departure delay vs hour (nycflights13). Splines provide a compact, smooth alternative to high-order polynomials

Note

Source: nycflights13 package (CC0)

fl <- nycflights13::flights %>%
  filter(!is.na(dep_delay), !is.na(hour))
m_spline_fl <- lm(dep_delay ~ bs(hour, df = 4), data = fl)
summary(m_spline_fl)

Call:
lm(formula = dep_delay ~ bs(hour, df = 4), data = fl)

Residuals:
    Min      1Q  Median      3Q     Max 
 -66.09  -18.26   -9.49   -1.03 1296.23 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)         0.3436     0.4035   0.852    0.394    
bs(hour, df = 4)1   3.2272     0.7622   4.234  2.3e-05 ***
bs(hour, df = 4)2   7.0820     0.5462  12.966  < 2e-16 ***
bs(hour, df = 4)3  32.5465     0.7685  42.349  < 2e-16 ***
bs(hour, df = 4)4  16.3872     0.6992  23.436  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 39.38 on 328516 degrees of freedom
Multiple R-squared:  0.04081,   Adjusted R-squared:  0.0408 
F-statistic:  3494 on 4 and 328516 DF,  p-value: < 2.2e-16

Visualize effect curve with confidence band for any fitted model

newx <- tibble(displ = seq(min(mpg$displ), max(mpg$displ), length.out = 200))
preds <- cbind(newx, predict(m_quad_mpg, newdata = newx, interval = "confidence"))
ggplot(preds, aes(displ, fit)) +
  geom_line() +
  geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.2) +
  labs(title = "Predicted hwy vs displacement", subtitle = "Quadratic fit with 95% CI")

16.7 Learning curves and experience effects

Solar PV learning curve: log(price_per_watt) ~ log(cumulative_capacity)
Download OWID series and join to create solar_df with columns price_watt, cum_capacity

Note

Source: Our World in Data

# solar_df <- read_csv("data/owid_solar_joined.csv")
# m_lc <- lm(log(price_watt) ~ log(cum_capacity), data = solar_df)
# lr <- 1 - 2^(coef(m_lc)[2])   # learning rate per doubling of capacity
# lr

17 Diagnostics and validation

  • Recheck residuals after transformation; look for remaining curvature and heteroscedasticity
  • Compare forms with light cross-validation (e.g., repeated splits or K-fold)
  • Avoid high-order oscillations; do not extrapolate beyond observed \(X\)
set.seed(123)
idx <- sample.int(nrow(mpg), size = floor(0.8 * nrow(mpg)))
m_train <- mpg[idx, ]; m_test <- mpg[-idx, ]

m_lin_mpg  <- lm(hwy ~ displ, data = m_train)
m_quad_mpg <- lm(hwy ~ displ + I(displ^2), data = m_train)

rmse <- function(y, yhat) sqrt(mean((y - yhat)^2))
rmse_lin  <- rmse(m_test$hwy,  predict(m_lin_mpg,  newdata = m_test))
rmse_quad <- rmse(m_test$hwy,  predict(m_quad_mpg, newdata = m_test))

tibble(model = c("linear","quadratic"), rmse = c(rmse_lin, rmse_quad))

18 Communicating effects to managers

  • Translate derivatives into plain language: where gains diminish, where risks rise
  • Report elasticities or semi-elasticities at meaningful \(X\) values
  • Plot predicted curves within supported ranges; mark turning points and slope changes
TipTip

For pricing and demand, log–log elasticities communicate clearly. For costs, linear or linear–log forms often align with managerial expectations

19 Advanced methods (brief survey; implementations beyond scope)

  • Generalized Additive Models (GAM): smooth, additive curves with interpretable partial effects
  • Tree-based ensembles (random forests, gradient boosting): flexible prediction with interactions and nonlinearity
  • Local methods (k-nearest neighbors): simple non-parametric baseline
CautionScope

Implementation of these advanced methods is beyond the scope of this chapter; use them as benchmarks when OLS forms clearly underfit

20 Summary of Key Concepts

  • Curvature is common in business data; start by visualizing on the original scale
  • Choose the simplest form that removes residual structure and communicates clearly
  • Transformations provide interpretable parameters: linear–log for diminishing returns, log–linear for percent impacts, log–log for elasticities
  • Low-order polynomials capture smooth curvature; compute turning points at \(X^{\star} =-\beta_1/(2\beta_2)\) when applicable
  • Piecewise linear models align with policy thresholds and pricing tiers; justify knot placement with business logic
  • Validate with residual checks and light cross-validation; avoid extrapolation beyond observed \(X\)
  • Communicate effects using prediction curves with confidence bands and translate into percent or unit impacts for decision-makers

21 Glossary of Terms

  • Elasticity: the percent change in \(Y\) associated with a 1% change in \(X\); in log–log models, equals the slope coefficient
  • Semi-elasticity: the percent change in \(Y\) per unit change in \(X\) (log–linear) or the unit change in \(Y\) per 1% change in \(X\) (linear–log)
  • Turning point: the \(X\) value where a quadratic’s marginal effect \(dY/dX=\beta_1+2\beta_2 X\) equals zero, \(X^{\star} =-\beta_1/(2\beta_2)\)
  • Hinge term: a piecewise linear basis function \(\max(0, X-k)\) that allows a slope change at knot \(k\)
  • Knot: the value of \(X\) where the slope is permitted to change in a piecewise model
  • Partial residual plot: a visualization of a predictor’s adjusted relationship with the outcome after accounting for other covariates
  • Smoother: a flexible curve (e.g., loess) used in EDA to reveal non-linear patterns without specifying a parametric form
  • Overfitting: modeling noise as if it were signal; often manifests with high-order polynomials or excessive flexibility
  • Extrapolation: using the model to predict beyond the observed range of \(X\), where functional-form assumptions are least reliable