data <- read.csv("data.csv")Lecture 9: Working with Data Frames, Lists, and Formulas in R
Overview
- In this lecture, we’ll cover:
- How to load and save data in R using data frames.
- Working with lists to manage complex data.
- How to use formulas for modeling and statistical analysis.
1. Loading and Saving Data
- What is a data frame?
- A data frame is a table-like structure where each column can hold different types of data (numeric, character, etc.), making it ideal for storing datasets.
- Loading data into a data frame:
- You can load external data, such as a CSV file, into a data frame using the
read.csv()function.
- You can load external data, such as a CSV file, into a data frame using the
- Example: Loading a CSV file:
- Saving data from a data frame:
- You can save a data frame to a CSV file using the
write.csv()function.
- You can save a data frame to a CSV file using the
- Example: Saving a data frame to a CSV file:
write.csv(data, "output.csv")- Viewing the structure of a data frame:
- The
str()function shows the structure of the data frame, including the types of variables and the first few rows of data.
- The
- Example: Viewing the structure of a data frame:
str(data)function (..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"),
envir = .GlobalEnv, overwrite = TRUE)
2. Working with Lists
- What is a list in R?
- A list is a flexible data structure that can store objects of different types, including vectors, data frames, and other lists.
- How to create a list:
- Use the
list()function to create a list that holds various elements.
- Use the
- Example: Creating a list with different data types:
my_list <- list(
name = "Alice",
age = 25,
salary = 50000,
skills = c("R", "Python", "SQL")
)- Accessing elements in a list:
- Use double square brackets
[[ ]]or the$operator to access elements in a list.
- Use double square brackets
- Example: Accessing the
nameelement:
my_list$name[1] "Alice"
- Example: Accessing the
skillsvector:
my_list$skills[1] "R" "Python" "SQL"
- Modifying elements in a list:
- You can modify existing elements by assigning new values.
- Example: Changing the
salaryelement:
my_list$salary <- 600003. Using Formulas in R
- What is a formula?
- A formula in R is used to express relationships between variables, especially in statistical modeling.
- The general syntax for a formula is
y ~ x, whereyis the dependent variable andxis the independent variable.
- Example: Creating a simple formula:
# A formula for predicting mpg using cyl and disp
formula <- mpg ~ cyl + disp - Formulas in regression analysis:
- Formulas are widely used in regression analysis to specify models.
- Example: Using a formula in a linear regression:
model <- lm(mpg ~ cyl + disp, data = mtcars)
summary(model) # View the summary of the regression model
Call:
lm(formula = mpg ~ cyl + disp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.4213 -2.1722 -0.6362 1.1899 7.0516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.66099 2.54700 13.609 4.02e-14 ***
cyl -1.58728 0.71184 -2.230 0.0337 *
disp -0.02058 0.01026 -2.007 0.0542 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.055 on 29 degrees of freedom
Multiple R-squared: 0.7596, Adjusted R-squared: 0.743
F-statistic: 45.81 on 2 and 29 DF, p-value: 1.058e-09
- Explanation:
- The formula
mpg ~ cyl + disptells R to modelmpg(miles per gallon) as a function ofcyl(cylinders) anddisp(displacement). - The
lm()function fits a linear model, andsummary()provides the model’s details.
- The formula
4. Combining Data Frames, Lists, and Formulas
- Lists of data frames:
- You can store multiple data frames in a list and use formulas to perform statistical analysis on each one.
- Example: Creating a list of data frames and applying a formula:
df1 <- data.frame(name = c("Alice", "Bob"), age = c(25, 30))
df2 <- data.frame(name = c("Charlie", "David"), age = c(35, 40))
data_list <- list(df1, df2)
# Applying a simple operation to one of the data frames
mean_age_df1 <- mean(data_list[[1]]$age)- Using formulas with data in lists:
- You can apply formulas to perform calculations or modeling on data stored in lists.
- Example: Applying a formula to a data frame in a list:
model_df1 <- lm(age ~ name, data = data_list[[1]])Key Takeaways
- Data frames allow you to load, store, and manipulate tabular data in R.
- Lists are flexible and can store a variety of data types, making them useful for managing complex datasets.
- Formulas are essential for statistical analysis and are commonly used in models like regression.
Looking Forward
- In the next lecture, we’ll dive into programming in R, focusing on writing scripts, using loops and conditional statements, and writing custom functions to solve more advanced problems.