11.7 Glossary of Terms
Anomaly Detection: A data mining technique that identifies observations significantly different from the majority of the data. Also called outlier detection.
Apriori Algorithm: An algorithm for association rule learning that identifies frequent item sets by iteratively extending them and pruning infrequent candidates.
Association Rule Learning: A technique for discovering relationships between variables in transactional data, such as which products are frequently purchased together.
Autoencoder: A type of neural network used for anomaly detection that learns to reconstruct normal data and flags observations with high reconstruction error.
Classification: A supervised learning technique that assigns observations to predefined categories based on learned patterns from labeled training data.
Clustering: An unsupervised learning technique that groups similar observations together without predefined labels, discovering natural structure in the data.
DBSCAN: A density-based clustering algorithm that groups closely packed points and identifies isolated points as outliers.
Decision Tree: A classification algorithm that uses a tree-like structure of decision rules to split data into increasingly pure subsets.
Isolation Forest: A machine learning algorithm for anomaly detection that isolates outliers by randomly partitioning the feature space.
K-Means: A clustering algorithm that partitions data into K groups by minimizing the variance within each cluster.
K-Nearest Neighbors (KNN): A non-parametric algorithm that classifies observations based on the majority class of their K closest neighbors in the feature space.
Market Basket Analysis: An application of association rule learning that identifies products frequently purchased together in transactional data.
Naive Bayes: A probabilistic classification algorithm based on Bayes’ theorem that assumes independence between features.
Support Vector Machine (SVM): A classification algorithm that finds the hyperplane that best separates classes in the feature space.
Supervised Learning: A machine learning approach where the model is trained on labeled data — the correct answer is known for each training example. Classification and regression are supervised techniques.
Unsupervised Learning: A machine learning approach where the model discovers structure in unlabeled data without predefined correct answers. Clustering and association rule learning are unsupervised techniques.
Z-Score: A statistical measure of how many standard deviations an observation is from the mean, used in anomaly detection to flag extreme values.