- R revolves around functions: denoted by
[function name]()
- Every function takes an input, defined by arguments, often provided by the user
- Many functions have default settings for these arguments
example([function name])
shows you how it is used
June 15, 2015
[function name]()
example([function name])
shows you how it is used?[function name]
or help([function name])
will pop up the help menu## ?str ## help("str")
For example, length
is a function we briefly covered last module. You can try typing ?length
in the console and reading the help file.
You can also see examples of running a function using example()
[which is another function!]
data.frame
: traditional 'Excel' spreadsheets
We have already covered character
and numeric
class(c("Andrew", "Jaffe"))
## [1] "character"
class(c(1, 4, 7))
## [1] "numeric"
Recall that c()
and class()
are both functions!
Integer
is a special subset of numeric
that contains only whole numbers
A sequence of numbers is an example of the integer class
x = seq(from = 1, to = 5) # seq() is a function x
## [1] 1 2 3 4 5
class(x)
## [1] "integer"
The colon :
is a shortcut for making sequences of numbers
It makes consecutive integer sequence from [num1]
to [num2]
by 1
1:5
## [1] 1 2 3 4 5
factor
are special character
vectors where the elements have pre-defined groups or 'levels'. You can think of these as qualitative or categorical variables:
x = factor(c("boy", "girl", "girl", "boy", "girl")) x
## [1] boy girl girl boy girl ## Levels: boy girl
class(x)
## [1] "factor"
Note that levels are, by default, alphabetical or alphanumerical order.
logical
is a class that only has two possible elements: TRUE
and FALSE
x = c(TRUE, FALSE, TRUE, TRUE, FALSE) class(x)
## [1] "logical"
sum()
and mean()
work on logical
vectors - they return the total and proportion of TRUE
elements, respectively.
Note that logical
elements are NOT in quotes.
z = c("TRUE", "FALSE", "TRUE", "FALSE") class(z)
## [1] "character"
Useful functions for exploring vectors (and other data types):
length()
head()
and tail()
table()
subset()
and brackets ([ ]
)unique()
sum()
, mean()
,median()
, min()
, max()
head()
shows the first 6 (default) elements of an R objecttail()
shows the last 6 (default) elements of an R objectstr()
shows the structure of an R objectz = 1:100 # recall a sequence from 1 to 100 head(z)
## [1] 1 2 3 4 5 6
tail(z)
## [1] 95 96 97 98 99 100
str(z)
## int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
These functions show a brief snapshot of the data which is useful for exploratory data analysis.
table()
is the basic tabulation function, which is often more useful for character
and factor
vectors
From the manual: "table uses the cross-classifying factors to build a contingency table of the counts at each combination of factor level"
x = c("boy", "girl", "girl", "boy", "girl") table(x)
## x ## boy girl ## 2 3
y = c(1, 2, 1, 2, 1) table(x,y)
## y ## x 1 2 ## boy 1 1 ## girl 2 1
Brackets are used to select/subset/extract data in R
x1 = 10:20 x1
## [1] 10 11 12 13 14 15 16 17 18 19 20
length(x1)
## [1] 11
x1[1] # selecting first element
## [1] 10
x1[3:4] # selecting third and fourth elements
## [1] 12 13
x1[c(1, 5, 7)] # first, fifth, and seventh elements
## [1] 10 14 16
n = 1:9 # sequence from first number to second number incrementing by 1 n
## [1] 1 2 3 4 5 6 7 8 9
mat = matrix(n, nrow = 3) mat
## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
These are in addition to the previous useful vector functions:
nrow()
displays the number of rows of a matrix or data framencol()
displays the number of columnsdim()
displays a vector of length 2: # rows, # columnscolnames()
displays the column names (if any) and rownames()
displays the row names (if any)Matrices have two "slots" you can use to select data, which represent rows and columns, that are separated by a comma, so the syntax is matrix[row,column]
.
mat[1, 1] # individual entry: row 1, column 1
## [1] 1
mat[1, ] # first row
## [1] 1 4 7
mat[, 1] # first columns
## [1] 1 2 3
Note that the class of the returned object is no longer a matrix
class(mat[1, ])
## [1] "integer"
class(mat[, 1])
## [1] "integer"
The data.frame
is the other two dimensional variable class.
Again, data frames are like matrices, but each column is a vector that can have its own class. So some columns might be character
and others might be numeric
, while others maybe a factor
.
We can look at some of the example data frames that come with R.
data(iris) ## just use some data in R already names(iris) ## get the column names
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ## [5] "Species"
str(iris) # easy snapshot of data, like `describe` in Stata
## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris, 3) # get top 3 rows
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa
Data frames have special ways to select data, specifically by a $
and the column name.
head(iris$Petal.Length)
## [1] 1.4 1.4 1.3 1.5 1.4 1.7
class(iris$Petal.Width)
## [1] "numeric"
You can also subset data frames like matrices, using row and column indices, but using column names is generally safer and more reproducible.
head(iris[, 2])
## [1] 3.5 3.0 3.2 3.1 3.6 3.9
You can also use the bracket notation, but specify the name(s) in quotes if you want more than 1 column. This allows you to subset rows and columns at the same time
iris[1:3, c("Sepal.Width", "Species")]
## Sepal.Width Species ## 1 3.5 setosa ## 2 3.0 setosa ## 3 3.2 setosa
You can make your own data frames from "scratch" too, either from a matrix or using the data.frame
function:
x = c("Andrew", "Leonardo", "Shaun") y = 1:3 df = data.frame(name = x, id = y) df
## name id ## 1 Andrew 1 ## 2 Leonardo 2 ## 3 Shaun 3
You can add variables to a data.frame
using $
as well:
iris2 = iris # copy `iris` to a new df iris2$Index = 1:nrow(iris2) head(iris2)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Index ## 1 5.1 3.5 1.4 0.2 setosa 1 ## 2 4.9 3.0 1.4 0.2 setosa 2 ## 3 4.7 3.2 1.3 0.2 setosa 3 ## 4 4.6 3.1 1.5 0.2 setosa 4 ## 5 5.0 3.6 1.4 0.2 setosa 5 ## 6 5.4 3.9 1.7 0.4 setosa 6
names(iris2)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ## [5] "Species" "Index"
Extensions of "normal" data classes: