4.5 What Do We See So Far?

Even from this brief first look, several patterns and questions emerge:

  • Most absences are short — the distribution of hours is heavily right-skewed, with most events under 8 hours.
  • A few absence reasons dominate — some ICD codes appear far more frequently than others, suggesting that certain types of health issues or appointments drive the majority of absences.
  • Day-of-week patterns may exist — the boxplots hint at variation across days, though we will need more rigorous analysis to determine whether these differences are meaningful.
  • Monthly variation is visible — some months show higher medians or more extreme outliers, suggesting seasonal effects worth exploring in later chapters.

These are exactly the kinds of observations that motivate deeper analysis. In Chapters 5 and 6, we will clean and transform the data to prepare it for that analysis. In Chapters 7 and 8, we will build more sophisticated visualizations to explore these patterns in detail. And in Chapters 9 through 12, we will use statistical models to test whether the patterns we see are real and which factors best predict absenteeism.