11.7 Glossary of Terms

  1. Anomaly Detection: A data mining technique that identifies observations significantly different from the majority of the data. Also called outlier detection.

  2. Apriori Algorithm: An algorithm for association rule learning that identifies frequent item sets by iteratively extending them and pruning infrequent candidates.

  3. Association Rule Learning: A technique for discovering relationships between variables in transactional data, such as which products are frequently purchased together.

  4. Autoencoder: A type of neural network used for anomaly detection that learns to reconstruct normal data and flags observations with high reconstruction error.

  5. Classification: A supervised learning technique that assigns observations to predefined categories based on learned patterns from labeled training data.

  6. Clustering: An unsupervised learning technique that groups similar observations together without predefined labels, discovering natural structure in the data.

  7. DBSCAN: A density-based clustering algorithm that groups closely packed points and identifies isolated points as outliers.

  8. Decision Tree: A classification algorithm that uses a tree-like structure of decision rules to split data into increasingly pure subsets.

  9. Isolation Forest: A machine learning algorithm for anomaly detection that isolates outliers by randomly partitioning the feature space.

  10. K-Means: A clustering algorithm that partitions data into K groups by minimizing the variance within each cluster.

  11. K-Nearest Neighbors (KNN): A non-parametric algorithm that classifies observations based on the majority class of their K closest neighbors in the feature space.

  12. Market Basket Analysis: An application of association rule learning that identifies products frequently purchased together in transactional data.

  13. Naive Bayes: A probabilistic classification algorithm based on Bayes’ theorem that assumes independence between features.

  14. Support Vector Machine (SVM): A classification algorithm that finds the hyperplane that best separates classes in the feature space.

  15. Supervised Learning: A machine learning approach where the model is trained on labeled data — the correct answer is known for each training example. Classification and regression are supervised techniques.

  16. Unsupervised Learning: A machine learning approach where the model discovers structure in unlabeled data without predefined correct answers. Clustering and association rule learning are unsupervised techniques.

  17. Z-Score: A statistical measure of how many standard deviations an observation is from the mean, used in anomaly detection to flag extreme values.