7.9 Glossary of Terms

  1. Aesthetic: A visual property of a plot element (e.g., position, color, shape, size) that is mapped to a variable in the data using aes() in ggplot2.

  2. Box Plot: A chart that displays the distribution of a continuous variable through its quartiles, showing the median, interquartile range, and outliers. Useful for comparing distributions across categories.

  3. Chart Junk: Non-essential visual elements in a chart — decorative graphics, heavy grid lines, unnecessary labels — that do not convey data and distract from the message.

  4. Composition: A data relationship showing how parts contribute to a whole. Visualized with stacked bar charts or pie charts.

  5. Correlation: A data relationship showing the association between two variables. Visualized with scatter plots.

  6. Distribution: The spread of data points across a range of values. Visualized with histograms and box plots.

  7. Explanatory Visualization: A visualization designed to communicate a specific finding to an audience. Emphasizes clarity, simplicity, and narrative impact.

  8. Exploratory Visualization: A visualization created during analysis to discover patterns, test hypotheses, and identify anomalies. Emphasizes speed and flexibility over polish.

  9. Faceting: The technique of creating multiple small plots (small multiples) from subsets of the data, enabling visual comparison across categories.

  10. Geom: A geometric object in ggplot2 that represents data visually — points (geom_point()), lines (geom_line()), bars (geom_bar()), boxes (geom_boxplot()), etc.

  11. ggplot2: An R package implementing the Grammar of Graphics, enabling the construction of complex, layered visualizations through a consistent and composable syntax.

  12. Grammar of Graphics: A framework developed by Leland Wilkinson that decomposes statistical graphics into fundamental components (data, aesthetics, geoms, scales, facets, themes), enabling systematic and flexible visualization construction.

  13. Heat Map: A visualization that uses color intensity to represent values across two dimensions, useful for identifying clusters and gradients in large datasets.

  14. Histogram: A chart that displays the distribution of a continuous variable by dividing the data into bins and showing the frequency of values in each bin.

  15. Data-Ink Ratio: Edward Tufte’s principle that the proportion of ink in a graphic devoted to displaying data should be maximized. Visual elements that do not convey data (chart junk) should be removed.

  16. Scale: A ggplot2 component that controls how data values are mapped to aesthetic properties — for example, mapping a variable to a color gradient or transforming an axis to a logarithmic scale.

  17. Plotly: An R package (and cross-language library) that adds interactivity to visualizations — hover tooltips, zooming, panning. The ggplotly() function converts any ggplot2 plot into an interactive version.

  18. Scatter Plot: A chart that plots individual data points on two continuous axes, used to visualize correlation between variables.

  19. Tableau: A widely used GUI-based visualization platform for creating interactive dashboards and explanatory visualizations in business settings.

  20. Theme: A ggplot2 component that controls non-data visual elements such as background color, grid lines, font sizes, and axis styling.