- Change the name of the "Location.1" column to "location"
June 16, 2015
Names are just an attribute of the data frame (recall str) that you can change to any valid character name
Valid character names are case-sensitive, contain a-z, 0-9, underscores, and periods (but cannot start with a number).
For the data.frame class, colnames() and names() return the same attribute.
> names(mon)
[1] "name" "zipCode" "neighborhood" "councilDistrict" [5] "policeDistrict" "Location.1"
> names(mon)[6] = "location" > names(mon)
[1] "name" "zipCode" "neighborhood" "councilDistrict" [5] "policeDistrict" "location"
These naming rules also apply for creating R objects
There are several ways to return the number of rows of a data frame or matrix
> nrow(mon)
[1] 84
> dim(mon)
[1] 84 6
> length(mon$name)
[1] 84
What are the:
zip codes
neighborhoods
council districts, and
police districts
that contain monuments, and how many monuments are in each?
unique() returns the unique entries in a vector
> unique(mon$zipCode)
[1] 21201 21202 21211 21213 21217 21218 21224 21230 21231 21214 21223 [12] 21225 21251
> unique(mon$policeDistrict)
[1] "CENTRAL" "NORTHERN" "NORTHEASTERN" "WESTERN" [5] "SOUTHEASTERN" "SOUTHERN" "EASTERN"
> unique(mon$councilDistrict)
[1] 11 7 14 13 1 10 3 2 9 12
> unique(mon$neighborhood)
[1] "Downtown" "Remington" [3] "Clifton Park" "Johns Hopkins Homewood" [5] "Mid-Town Belvedere" "Madison Park" [7] "Upton" "Reservoir Hill" [9] "Harlem Park" "Coldstream Homestead Montebello" [11] "Guilford" "McElderry Park" [13] "Patterson Park" "Canton" [15] "Middle Branch/Reedbird Parks" "Locust Point Industrial Area" [17] "Federal Hill" "Washington Hill" [19] "Inner Harbor" "Herring Run Park" [21] "Ednor Gardens-Lakeside" "Fells Point" [23] "Hopkins Bayview" "New Southwest/Mount Clare" [25] "Brooklyn" "Stadium Area" [27] "Mount Vernon" "Druid Hill Park" [29] "Morgan State University" "Dunbar-Broadway" [31] "Carrollton Ridge" "Union Square"
> length(unique(mon$zipCode))
[1] 13
> length(unique(mon$policeDistrict))
[1] 7
> length(unique(mon$councilDistrict))
[1] 10
> length(unique(mon$neighborhood))
[1] 32
Also note that table() can work, which tabulates a specific variable (or cross-tabulates two variables)
> table(mon$zipCode)
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231
11 16 8 4 1 9 14 4 8 1 3 4
21251
1
> length(table(mon$zipCode))
[1] 13
The "by hand" way is cross-tabulating the zip codes and neighborhoods,
> tab = table(mon$zipCode, mon$neighborhood) > # tab > tab[,"Downtown"]
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231
2 9 0 0 0 0 0 0 0 0 0 0
21251
0
> length(unique(tab[,"Downtown"]))
[1] 3
> tt = tab[,"Downtown"] > tt
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231
2 9 0 0 0 0 0 0 0 0 0 0
21251
0
> tt == 0 # which entries are equal to 0
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231 FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 21251 TRUE
> tab[,"Downtown"] !=0
21201 21202 21211 21213 21214 21217 21218 21223 21224 21225 21230 21231 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 21251 FALSE
> sum(tab[,"Downtown"] !=0)
[1] 2
> sum(tab[,"Johns Hopkins Homewood"] !=0)
[1] 2
We could also subset the data into neighborhoods:
> dt = mon[mon$neighborhood == "Downtown",] > head(mon$neighborhood == "Downtown",10)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
> dim(dt)
[1] 11 6
> length(unique(dt$zipCode))
[1] 2
How many monuments (a) do and (b) do not have an exact location/address?
> head(mon$location)
[1] "408 CHARLES ST\nBaltimore, MD\n" "" [3] "" "100 HOLLIDAY ST\nBaltimore, MD\n" [5] "50 MARKET PL\nBaltimore, MD\n" "100 CALVERT ST\nBaltimore, MD\n"
> table(mon$location != "") # FALSE=DO NOT and TRUE=DO
FALSE TRUE 26 58
Which:
contains the most number of monuments?
> tabZ = table(mon$zipCode) > head(tabZ)
21201 21202 21211 21213 21214 21217 11 16 8 4 1 9
> max(tabZ)
[1] 16
> tabZ[tabZ == max(tabZ)]
21202 16
which.max() returns the FIRST entry/element number that contains the maximum and which.min() returns the FIRST entry that contains the minimum
> which.max(tabZ) # this is the element number
21202
2
> tabZ[which.max(tabZ)] # this is the actual maximum
21202 16
> tabN = table(mon$neighborhood) > tabN[which.max(tabN)]
Johns Hopkins Homewood
17
> tabC = table(mon$councilDistrict) > tabC[which.max(tabC)]
11 29
> tabP = table(mon$policeDistrict) > tabP[which.max(tabP)]
CENTRAL
27
Monuments-tab.txt file from: http://www.aejaffe.com/summerR_2015/data/Monuments-tab.txt> monTab = read.delim("http://www.aejaffe.com/summerR_2015/data/Monuments-tab.txt",
+ header=TRUE, as.is=TRUE)
> identical(mon$name,monTab$name)
[1] TRUE