6 Case Study: Loading and Cleaning the Data
In Chapter 4, we took a first look at the raw Absenteeism at Work dataset — 740 rows, one per absence event. In Chapter 5, we learned the tools for data cleaning: filtering, selecting, mutating, grouping, and summarizing. Now we put those tools to work. In this chapter, we will transform the raw dataset into a clean, employee-level data frame ready for the visualization and modeling that follows in later chapters.
Chapter Goals
Upon concluding this chapter, readers will be able to:
- Load a dataset and apply a multi-step cleaning pipeline using Tidyverse functions.
- Write a custom R function to compute the mode of a variable.
- Aggregate event-level data into employee-level summaries using
group_by()andsummarise(). - Recode numeric variables into meaningful categorical labels using
ifelse(),factor(), andfct_recode(). - Save a cleaned data frame to a CSV file for use in subsequent analyses.