8.6 Homework Assignment: Data Visualization of Absenteeism Patterns

8.6.1 Objective:

The purpose of this homework assignment is to develop your skills in data cleaning and visualization by exploring absenteeism patterns based on the day of the week.

8.6.2 Data Source:

For this assignment, you will work with the “Absenteeism at Work Data Set”. The data will be loaded using the following R code:

absenteeism <- read.csv(
    "https://ljkelly3141.github.io/datasets/bi-book/Absenteeism_at_work.csv",
    sep = ";"
)

8.6.3 Instructions:

  1. Prepare the Dataset: Start by calculating two key metrics for each day of the week: the total number of absences and the total absenteeism time in hours. Note that in the chapter case we use the built-in constant, month.abb, but there is no equivalent for days of the week. So we can simply define our own day.name constant, e.g.
day.name <- c("Sunday", "Monday", "Tuesday",
              "Wednesday", "Thursday", "Friday", "Saturday")
  1. Visualizing Absenteeism by Day of the Week: Employ ggplot to create a bar chart that visualizes the distribution of the absenteeism metrics over the days of the week. To do this, you will need to reformat the data into a “long” format to make it compatible with ggplot’s requirements.

  2. Use Facets for Clarity: Due to differences in the scales of the two metrics—number of absences and total hours of absenteeism—it can be challenging to compare them directly. To enhance clarity and readability, utilize facets (or subplots) to separate the two metrics into distinct panels within your plot.

  3. Standardize the Data: Normalize the data by day of the week to ensure that the metrics are on a common scale. This process involves scaling the metrics so that they have a mean of zero and a standard deviation of one, which facilitates more meaningful comparisons across the days.

  4. Plot Standardized Data and Conclude: Plot the standardized data using ggplot. Your final task is to analyze the plots to identify any notable patterns and draft your conclusions based on your observations. Discuss any potential reasons for the patterns you observe and how they might inform workplace policies or practices.

8.6.4 Deliverable:

Produce a Quarto document detailing your analysis process, the visualizations, and your conclusions. Render this document to a DOCX file for submission.

Ensure your report is clear and well-organized, with all code snippets and visualizations properly included. Discuss your findings in a concise and thoughtful manner, providing insights into how these patterns might impact workplace management and strategies for reducing absenteeism.

The transition from data preparation to modeling marks a shift from “what does the data look like?” to “what can the data tell us?”