2.3 Understanding the Variables

The tables below describe each variable in the dataset, organized by category. Understanding these variables now will make it much easier to work with the data when we begin analysis in Chapter 4.

2.3.1 The Outcome Variable

Variable Description Type
Absenteeism time in hours Total hours absent for this event Numerical

This is our target variable — the thing we ultimately want to understand and predict.

2.3.2 Employee Demographics

Variable Description Type
ID Unique identifier for each employee Identifier
Age Employee’s age in years Numerical
Education Highest level attained (1 = high school, 2 = graduate, 3 = postgraduate, 4 = master/doctorate) Ordinal
Son Number of children Numerical (discrete)
Pet Number of pets Numerical (discrete)

2.3.3 Health Indicators

Variable Description Type
Body mass index BMI calculated from height and weight Numerical
Weight Employee’s weight (kg) Numerical
Height Employee’s height (cm) Numerical
Social drinker Whether the employee drinks socially (1 = yes, 0 = no) Binary
Social smoker Whether the employee smokes socially (1 = yes, 0 = no) Binary

2.3.4 Work Factors

Variable Description Type
Transportation expense Monthly transportation cost (Brazilian Real) Numerical
Distance from residence to work Commute distance in kilometers Numerical
Service time Years of employment at the company Numerical
Work load average/day Average daily workload (target units) Numerical
Hit target Percentage of productivity target achieved Numerical
Disciplinary failure Whether the employee has a disciplinary record (1 = yes, 0 = no) Binary

2.3.5 Absence Context

Variable Description Type
Reason for absence Coded reason using ICD categories (see below) Categorical
Month of absence Month when the absence occurred (1–12) Categorical
Day of the week Day of absence (2 = Monday through 6 = Friday) Categorical
Seasons Season of absence (1 = summer, 2 = autumn, 3 = winter, 4 = spring) Categorical

A Note on Data Types

You may have noticed that the variables above are labeled as different types — numerical, categorical, ordinal, and binary. These distinctions matter because the type of data determines which analytical techniques are appropriate. Numerical data (like age or BMI) can be averaged and used in mathematical calculations. Categorical data (like season or day of the week) represents groups or categories. Ordinal data (like education level) is categorical but has a natural order. Binary data (like social drinker) has only two possible values. We will revisit these distinctions throughout the book as we select appropriate methods for each type of variable.