4.5 What Do We See So Far?
Even from this brief first look, several patterns and questions emerge:
- Most absences are short — the distribution of hours is heavily right-skewed, with most events under 8 hours.
- A few absence reasons dominate — some ICD codes appear far more frequently than others, suggesting that certain types of health issues or appointments drive the majority of absences.
- Day-of-week patterns may exist — the boxplots hint at variation across days, though we will need more rigorous analysis to determine whether these differences are meaningful.
- Monthly variation is visible — some months show higher medians or more extreme outliers, suggesting seasonal effects worth exploring in later chapters.
These are exactly the kinds of observations that motivate deeper analysis. In Chapters 5 and 6, we will clean and transform the data to prepare it for that analysis. In Chapters 7 and 8, we will build more sophisticated visualizations to explore these patterns in detail. And in Chapters 9 through 12, we will use statistical models to test whether the patterns we see are real and which factors best predict absenteeism.