11.5 AI in Data Mining

AI is expanding what data mining can accomplish in two key ways.

First, AutoML platforms automate the process of trying multiple algorithms, tuning hyperparameters, and selecting the best-performing model for a given dataset (Baratchi et al. 2024). This makes data mining accessible to analysts who are not machine learning specialists. For clustering tasks specifically, automated methods can evaluate different numbers of clusters, distance metrics, and algorithms to find natural groupings without manual trial and error (Poulakis et al. 2024).

Second, large language models can assist with the interpretive side of data mining — the part that has traditionally required the most human expertise. An analyst can describe a dataset to an AI assistant and ask it to suggest appropriate mining techniques, generate the R code to execute them, and help interpret the results. This does not replace domain expertise, but it significantly lowers the barrier to applying sophisticated techniques (Tornede et al. 2024).

Example: AI-Assisted Anomaly Detection

Prompt to Claude Code: I have employee absence data with columns for hours absent, age, BMI, and commute distance. Find any employees whose absence patterns are unusual compared to the rest of the dataset.

The AI might generate code that fits a regression model, computes residuals, and flags observations with residuals beyond a threshold — essentially the same approach used in Chapter 12. The analyst’s role is to evaluate whether the flagged employees are genuinely anomalous or whether the model’s assumptions are inappropriate for this data.