3.9 Working with Data Frames, Lists, and Formulas
3.9.1 Loading and Saving Data
R provides functions to read data from and write data to external files. The most common format is CSV.
# Load data from a CSV file
data <- read.csv("data.csv")
# Save data to a CSV file
write.csv(data, "output.csv")In later chapters, we use read_delim() and related functions from the readr package (part of the tidyverse), which offer more control over how data is parsed. The base R functions shown here work well for simple cases.
3.9.2 Factors
Factors represent categorical data in R. They are important for statistical modeling because R treats factors differently from plain text — for example, in regression models, factors are automatically converted to indicator variables.
## [1] Male Female Male Male
## Levels: Female Male
3.9.3 Data Frames
Data frames are R’s primary structure for tabular data — like a spreadsheet where each column is a vector. Most data you work with in BI will be stored in data frames.
# Create a data frame
df <- data.frame(
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
salary = c(50000, 60000, 70000)
)
print(df)## name age salary
## 1 Alice 25 50000
## 2 Bob 30 60000
## 3 Charlie 35 70000
3.9.4 Lists
Lists can hold elements of different data types — vectors, data frames, and even other lists. They are useful for organizing complex results (e.g., the output of a statistical model is typically a list).
# Create a list
my_list <- list(
name = "John",
age = 30,
hobbies = c("reading", "playing guitar"),
address = data.frame(street = "123 Main St", city = "Anytown")
)
print(my_list)## $name
## [1] "John"
##
## $age
## [1] 30
##
## $hobbies
## [1] "reading" "playing guitar"
##
## $address
## street city
## 1 123 Main St Anytown