2.3 Understanding the Variables
The tables below describe each variable in the dataset, organized by category. Understanding these variables now will make it much easier to work with the data when we begin analysis in Chapter 4.
2.3.1 The Outcome Variable
| Variable | Description | Type |
|---|---|---|
| Absenteeism time in hours | Total hours absent for this event | Numerical |
This is our target variable — the thing we ultimately want to understand and predict.
2.3.2 Employee Demographics
| Variable | Description | Type |
|---|---|---|
| ID | Unique identifier for each employee | Identifier |
| Age | Employee’s age in years | Numerical |
| Education | Highest level attained (1 = high school, 2 = graduate, 3 = postgraduate, 4 = master/doctorate) | Ordinal |
| Son | Number of children | Numerical (discrete) |
| Pet | Number of pets | Numerical (discrete) |
2.3.3 Health Indicators
| Variable | Description | Type |
|---|---|---|
| Body mass index | BMI calculated from height and weight | Numerical |
| Weight | Employee’s weight (kg) | Numerical |
| Height | Employee’s height (cm) | Numerical |
| Social drinker | Whether the employee drinks socially (1 = yes, 0 = no) | Binary |
| Social smoker | Whether the employee smokes socially (1 = yes, 0 = no) | Binary |
2.3.4 Work Factors
| Variable | Description | Type |
|---|---|---|
| Transportation expense | Monthly transportation cost (Brazilian Real) | Numerical |
| Distance from residence to work | Commute distance in kilometers | Numerical |
| Service time | Years of employment at the company | Numerical |
| Work load average/day | Average daily workload (target units) | Numerical |
| Hit target | Percentage of productivity target achieved | Numerical |
| Disciplinary failure | Whether the employee has a disciplinary record (1 = yes, 0 = no) | Binary |
2.3.5 Absence Context
| Variable | Description | Type |
|---|---|---|
| Reason for absence | Coded reason using ICD categories (see below) | Categorical |
| Month of absence | Month when the absence occurred (1–12) | Categorical |
| Day of the week | Day of absence (2 = Monday through 6 = Friday) | Categorical |
| Seasons | Season of absence (1 = summer, 2 = autumn, 3 = winter, 4 = spring) | Categorical |
A Note on Data Types
You may have noticed that the variables above are labeled as different types — numerical, categorical, ordinal, and binary. These distinctions matter because the type of data determines which analytical techniques are appropriate. Numerical data (like age or BMI) can be averaged and used in mathematical calculations. Categorical data (like season or day of the week) represents groups or categories. Ordinal data (like education level) is categorical but has a natural order. Binary data (like social drinker) has only two possible values. We will revisit these distinctions throughout the book as we select appropriate methods for each type of variable.