9.2 The Modeling Workflow
Model building in BI follows a systematic workflow, much like the data preparation workflow introduced in Chapter 5. Each step builds on the previous one, and the process is often iterative — results from later stages may send you back to revise earlier decisions.
- Problem definition — What business question does the model need to answer? What decisions will it inform? A clear problem statement keeps the entire process aligned with business value.
- Data collection and preparation — Gather relevant data and apply the cleaning, transformation, and tidying techniques from Chapters 5 and 6. The quality of the data directly determines the quality of the model.
- Model selection — Choose a modeling technique appropriate to the problem, the data, and the audience. Simpler models are preferred when they perform comparably to complex ones (the principle of parsimony).
- Model training — Fit the model to historical data, adjusting its parameters to capture the underlying patterns. This typically involves splitting data into training and test sets.
- Validation and testing — Evaluate the model’s performance on data it has not seen, using metrics appropriate to the problem type (e.g., RMSE for regression, accuracy for classification).
- Deployment and monitoring — Integrate the model into business processes and continuously track its performance. Models degrade over time as the world changes, requiring periodic retraining.
This workflow provides the structure for the rest of the chapter. We begin by examining the three types of models, then work through selection, training, validation, and deployment in detail.