4.3 Summary Statistics
The describe() function from the psych package provides a professional summary of the numeric variables in the dataset:
describe(work, skew = FALSE, ranges = FALSE, omit = TRUE) |>
kable(digits = 2) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE)| vars | n | mean | sd | se | |
|---|---|---|---|---|---|
| ID | 1 | 740 | 18.02 | 11.02 | 0.41 |
| Reason for absence | 2 | 740 | 19.22 | 8.43 | 0.31 |
| Month of absence | 3 | 740 | 6.32 | 3.44 | 0.13 |
| Day of the week | 4 | 740 | 3.91 | 1.42 | 0.05 |
| Seasons | 5 | 740 | 2.54 | 1.11 | 0.04 |
| Transportation expense | 6 | 740 | 221.33 | 66.95 | 2.46 |
| Distance from Residence to Work | 7 | 740 | 29.63 | 14.84 | 0.55 |
| Service time | 8 | 740 | 12.55 | 4.38 | 0.16 |
| Age | 9 | 740 | 36.45 | 6.48 | 0.24 |
| Work load Average/day | 10 | 740 | 271.49 | 39.06 | 1.44 |
| Hit target | 11 | 740 | 94.59 | 3.78 | 0.14 |
| Disciplinary failure | 12 | 740 | 0.05 | 0.23 | 0.01 |
| Education | 13 | 740 | 1.29 | 0.67 | 0.02 |
| Son | 14 | 740 | 1.02 | 1.10 | 0.04 |
| Social drinker | 15 | 740 | 0.57 | 0.50 | 0.02 |
| Social smoker | 16 | 740 | 0.07 | 0.26 | 0.01 |
| Pet | 17 | 740 | 0.75 | 1.32 | 0.05 |
| Weight | 18 | 740 | 79.04 | 12.88 | 0.47 |
| Height | 19 | 740 | 172.11 | 6.03 | 0.22 |
| Body mass index | 20 | 740 | 26.68 | 4.29 | 0.16 |
| Absenteeism time in hours | 21 | 740 | 6.92 | 13.33 | 0.49 |
This gives us the variable number (vars), count (n), mean, standard deviation (sd), minimum (min), maximum (max), and standard error (se) for each variable. Let’s highlight a few key observations:
Absenteeism time in hours — This is our target variable, the outcome we ultimately want to understand. The median is 3 hours and the mean is 6.9 hours. The fact that the mean is higher than the median tells us the distribution is right-skewed — most absences are relatively short, but some are much longer, pulling the average up. The maximum is 120 hours, which represents an unusually long absence.
Age — Employees in the dataset range from 27 to 58 years old, with a median age of 37.
Distance from Residence to Work — Commute distances range from 5 to 52 kilometers. The median is 26 km.
These numbers give us a first sense of the data’s shape and scale. But numbers alone can be hard to interpret — visualizations often reveal patterns more clearly.