3.7 Vectors and Other Variables

Variables in R can hold different types of data. One of the most fundamental data structures is the vector — an ordered sequence of elements of the same data type. Unlike a mathematical vector (which implies direction and magnitude), an R vector is simply a sequence of values.

3.7.1 Storing Many Numbers as a Vector

Use the c() function (“combine”) to create a vector from individual values.

# Store a vector of numbers
numbers <- c(10, 20, 30, 40, 50)
print(numbers)
## [1] 10 20 30 40 50

3.7.2 Storing Text Data

Vectors can also hold text (called “character” data in R). Enclose text values in quotes.

# Store a vector of names
names <- c("Alice", "Bob", "Charlie", "David")
print(names)
## [1] "Alice"   "Bob"     "Charlie" "David"

3.7.3 Storing “True or False” Data

Logical values (TRUE and FALSE) are used for conditions and filtering. While R allows the abbreviations T and F, avoid using them — unlike TRUE and FALSE, they can be overwritten by accident (e.g., T <- 5), which leads to subtle bugs.

# Store a vector of logical values
logical_vector <- c(TRUE, FALSE, TRUE, TRUE)
print(logical_vector)
## [1]  TRUE FALSE  TRUE  TRUE

3.7.4 Indexing Vectors

Use square brackets ([]) to access or modify individual elements. Note that R indexing starts at 1, not 0.

# Accessing individual elements
numbers <- c(10, 20, 30, 40, 50)
print(numbers[1])  # Access the first element
## [1] 10
# Accessing subsets
print(numbers[2:4])  # Access elements 2 to 4
## [1] 20 30 40
# Modifying elements
numbers[3] <- 35  # Change the value of the third element
print(numbers)
## [1] 10 20 35 40 50

3.7.5 Missing Values (NA)

Real-world data frequently contains missing values. In R, missing values are represented by NA (Not Available). Understanding how R handles NA is essential because missing values can silently affect calculations.

# A vector with a missing value
temps <- c(72, 68, NA, 75, 71)

# mean() returns NA if any value is missing
mean(temps)
## [1] NA
# Use na.rm = TRUE to ignore missing values
mean(temps, na.rm = TRUE)
## [1] 71.5
# Check which values are missing
is.na(temps)
## [1] FALSE FALSE  TRUE FALSE FALSE

Many R functions return NA by default when the input contains missing values — this is a safety feature that forces you to make an explicit decision about how to handle them. The na.rm = TRUE argument (available in functions like mean(), sum(), sd()) removes NA values before computing the result. We will explore more sophisticated approaches to missing data in Chapter 5.