6 Case Study: Loading and Cleaning the Data

In Chapter 4, we took a first look at the raw Absenteeism at Work dataset — 740 rows, one per absence event. In Chapter 5, we learned the tools for data cleaning: filtering, selecting, mutating, grouping, and summarizing. Now we put those tools to work. In this chapter, we will transform the raw dataset into a clean, employee-level data frame ready for the visualization and modeling that follows in later chapters.

Chapter Goals

Upon concluding this chapter, readers will be able to:

  1. Load a dataset and apply a multi-step cleaning pipeline using Tidyverse functions.
  2. Write a custom R function to compute the mode of a variable.
  3. Aggregate event-level data into employee-level summaries using group_by() and summarise().
  4. Recode numeric variables into meaningful categorical labels using ifelse(), factor(), and fct_recode().
  5. Save a cleaned data frame to a CSV file for use in subsequent analyses.