6.1 Loading the Data

We begin by loading the raw dataset from its hosted URL using read_delim(), as we did in Chapter 4:

absenteeism <- read_delim(
  "https://ljkelly3141.github.io/datasets/bi-book/Absenteeism_at_work.csv",
  delim = ";"
)

As we saw in Chapter 4, the column names contain spaces (e.g., Reason for absence, Service time). We handle this by wrapping column names in backticks whenever we reference them in code — for example, `Service time` instead of Service.time. This is standard practice in R when working with data that has spaces in column names.

Recall that this dataset has 740 rows — one row per absence event — and 21 columns. Our goal is to transform it into a dataset with one row per employee, summarizing each employee’s absenteeism patterns.