- Change the name of the "Location.1" column to "location"
June 16, 2015
Names are just an attribute of the data frame (recall str
) that you can change to any valid character name
Valid character names are case-sensitive, contain a-z, 0-9, underscores, and periods (but cannot start with a number).
For the data.frame
class, colnames()
and names()
return the same attribute.
> names(mon)
[1] "name" "zipCode" "neighborhood" "councilDistrict" [5] "policeDistrict" "Location.1"
> names(mon)[6] = "location" > names(mon)
[1] "name" "zipCode" "neighborhood" "councilDistrict" [5] "policeDistrict" "location"
These naming rules also apply for creating R objects
There are several ways to return the number of rows of a data frame or matrix
> nrow(mon)
[1] 84
> dim(mon)
[1] 84 6
> length(mon$name)
[1] 84
What are the:
zip codes
neighborhoods
council districts, and
police districts
that contain monuments, and how many monuments are in each?
unique()
returns the unique entries in a vector
> unique(mon$zipCode)
[1] 21201 21202 21211 21213 21217 21218 21224 21230 21231 21214 21223 [12] 21225 21251
> unique(mon$policeDistrict)
[1] "CENTRAL" "NORTHERN" "NORTHEASTERN" "WESTERN" [5] "SOUTHEASTERN" "SOUTHERN" "EASTERN"
> unique(mon$councilDistrict)
[1] 11 7 14 13 1 10 3 2 9 12
> unique(mon$neighborhood)
[1] "Downtown" "Remington" [3] "Clifton Park" "Johns Hopkins Homewood" [5] "Mid-Town Belvedere" "Madison Park" [7] "Upton" "Reservoir Hill" [9] "Harlem Park" "Coldstream Homestead Montebello" [11] "Guilford" "McElderry Park" [13] "Patterson Park" "Canton" [15] "Middle Branch/Reedbird Parks" "Locust Point Industrial Area" [17] "Federal Hill" "Washington Hill" [19] "Inner Harbor" "Herring Run Park" [21] "Ednor Gardens-Lakeside" "Fells Point" [23] "Hopkins Bayview" "New Southwest/Mount Clare" [25] "Brooklyn" "Stadium Area" [27] "Mount Vernon" "Druid Hill Park" [29] "Morgan State University" "Dunbar-Broadway" [31] "Carrollton Ridge" "Union Square"
> length(unique(mon$zipCode))
[1] 13
> length(unique(mon$policeDistrict))
[1] 7
> length(unique(mon$councilDistrict))
[1] 10
> length(unique(mon$neighborhood))
[1] 32
Also note that table() can work, which tabulates a specific variable (or cross-tabulates two variables)
> table(mon$zipCode)
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231 11 16 8 4 1 9 14 4 8 1 3 4 21251 1
> length(table(mon$zipCode))
[1] 13
The "by hand" way is cross-tabulating the zip codes and neighborhoods,
> tab = table(mon$zipCode, mon$neighborhood) > # tab > tab[,"Downtown"]
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231 2 9 0 0 0 0 0 0 0 0 0 0 21251 0
> length(unique(tab[,"Downtown"]))
[1] 3
> tt = tab[,"Downtown"] > tt
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231 2 9 0 0 0 0 0 0 0 0 0 0 21251 0
> tt == 0 # which entries are equal to 0
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231 FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 21251 TRUE
> tab[,"Downtown"] !=0
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 21251 FALSE
> sum(tab[,"Downtown"] !=0)
[1] 2
> sum(tab[,"Johns Hopkins Homewood"] !=0)
[1] 2
We could also subset the data into neighborhoods:
> dt = mon[mon$neighborhood == "Downtown",] > head(mon$neighborhood == "Downtown",10)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
> dim(dt)
[1] 11 6
> length(unique(dt$zipCode))
[1] 2
How many monuments (a) do and (b) do not have an exact location/address?
> head(mon$location)
[1] "408 CHARLES ST\nBaltimore, MD\n" "" [3] "" "100 HOLLIDAY ST\nBaltimore, MD\n" [5] "50 MARKET PL\nBaltimore, MD\n" "100 CALVERT ST\nBaltimore, MD\n"
> table(mon$location != "") # FALSE=DO NOT and TRUE=DO
FALSE TRUE 26 58
Which:
contains the most number of monuments?
> tabZ = table(mon$zipCode) > head(tabZ)
21201 21202 21211 21213 21214 21217 11 16 8 4 1 9
> max(tabZ)
[1] 16
> tabZ[tabZ == max(tabZ)]
21202 16
which.max()
returns the FIRST entry/element number that contains the maximum and which.min()
returns the FIRST entry that contains the minimum
> which.max(tabZ) # this is the element number
21202 2
> tabZ[which.max(tabZ)] # this is the actual maximum
21202 16
> tabN = table(mon$neighborhood) > tabN[which.max(tabN)]
Johns Hopkins Homewood 17
> tabC = table(mon$councilDistrict) > tabC[which.max(tabC)]
11 29
> tabP = table(mon$policeDistrict) > tabP[which.max(tabP)]
CENTRAL 27
Monuments-tab.txt
file from: http://www.aejaffe.com/summerR_2015/data/Monuments-tab.txt> monTab = read.delim("http://www.aejaffe.com/summerR_2015/data/Monuments-tab.txt", + header=TRUE, as.is=TRUE) > identical(mon$name,monTab$name)
[1] TRUE