Matt Dray (@mattdray)
TL;DR
Animals get stuck in weird places, just ask the London Fire Brigade. I used the sf
package for handling some animal rescue spatial data prior to interactive mapping with leaflet
. Scroll to the bottom to see the map.
The problem
Sometimes I work with eastings and northings coordinate data. It often helps to convert these to latitude and longitude for interactive mapping using the leaflet
package in R.
I’m going to demo the sf
package, show how to reproject coordinates, work with points and polygon data, make an interactive map with leaflet
and do a bonus bit of webscraping.
Special features
The sf
(‘simple features’) package from Edzer Pebesma has a series of classes and methods for spatial data. The package is becoming popular because simple feature access is a widely-used multi-platform ISO standard and the package supports the popular tidy data paradigm and can be integrated with tidyverse operations. This and more was spelled out by Pebesma in a post from UseR! 2017.
What does this mean for sp
, the go-to package for spatial analysis in R? Well, its not going anywhere for now. You can get an opinion on whether you should learn sf
or sp
for spatial R programming (spoiler: sf
, but maybe both if you can).
Some data
Animal rescues
I’ve found an interesting example dataset that contains eastings and northings: animal rescue incidents attended by the London Fire Brigade from the excellent London Data Store. In their words:
The London Fire Brigade attends a range of non-fire incidents (which we call ‘special services’). These ‘special services’ include assistance to animals that may be trapped or in distress.
This is close to my heart because a pigeon fell down our chimney recently. After a brave rescue mission (putting on some rubber gloves and contorting my arms through a tiny vent hole) I think I’m now qualified to join an elite search and rescue team.
Clean the data
First download the data from the datastore as a CSV file. Then we can read it and clean the column names.
library(readr) # for reading data
library(dplyr) # data manipulation and pipes (%>%)
library(janitor) # cleaning and misc functions
library(lubridate) # dealing with dates
# read the csv from where you saved it
rescue <- read_csv("data/lfb-animal-rescue.csv")
# replace rogue '£' with blank
names(rescue) <- iconv(names(rescue), to = "ASCII", sub = "")
# more cleaning
rescue <- rescue %>%
rename(IncidentNotionalCost = `IncidentNominalCost()`) %>% # no brackets
mutate(DateTimeOfCall = ymd_hms(DateTimeOfCall)) # convert type
# preview
glimpse(rescue)
## Observations: 5,352
## Variables: 25
## $ IncidentNumber <dbl> 139091, 275091, 2075091, 2872091, 355…
## $ DateTimeOfCall <dttm> 2001-01-20 09:03:01, 2001-01-20 09:0…
## $ CalYear <dbl> 2009, 2009, 2009, 2009, 2009, 2009, 2…
## $ FinYear <chr> "2008/09", "2008/09", "2008/09", "200…
## $ TypeOfIncident <chr> "Special Service", "Special Service",…
## $ PumpCount <chr> "1", "1", "1", "1", "1", "1", "1", "1…
## $ PumpHoursTotal <chr> "2", "1", "1", "1", "1", "1", "1", "1…
## $ IncidentNotionalCost <chr> "652", "326", "326", "326", "326", "3…
## $ FinalDescription <chr> "DOG WITH JAW TRAPPED IN MAGAZINE RAC…
## $ AnimalGroupParent <chr> "Dog", "Fox", "Dog", "Horse", "Rabbit…
## $ OriginofCall <chr> "Person (land line)", "Person (land l…
## $ PropertyType <chr> "House - single occupancy", "Railings…
## $ PropertyCategory <chr> "Dwelling", "Outdoor Structure", "Out…
## $ SpecialServiceTypeCategory <chr> "Other animal assistance", "Other ani…
## $ SpecialServiceType <chr> "Animal assistance involving livestoc…
## $ WardCode <chr> "E05000166", "E05000169", "E05000558"…
## $ Ward <chr> "Upper Norwood", "Woodside", "Carshal…
## $ BoroughCode <chr> "E09000008", "E09000008", "E09000029"…
## $ Borough <chr> "Croydon", "Croydon", "Sutton", "Hill…
## $ StnGroundName <chr> "Norbury", "Woodside", "Wallington", …
## $ PostcodeDistrict <chr> "SE19", "SE25", "SM5", "UB9", "RM3", …
## $ Easting_m <chr> "NULL", "534785", "528041", "504689",…
## $ Northing_m <chr> "NULL", "167546", "164923", "190685",…
## $ Easting_rounded <dbl> 532350, 534750, 528050, 504650, 55465…
## $ Northing_rounded <dbl> 170050, 167550, 164950, 190650, 19235…
So each row is an ‘incident’ to which the brigade were called and there’s 5352 of them recorded in the dataset. There are 25 columns for variables relating to incident, including when it was, what it was and where it was.
Explore the data
This post isn’t about the dataset, but it’s worth having a closer look at it. I’ve added an interactive table below so you can search for incidents, creatures, locations and the notional cost of the callout.
Obviously there are plenty of cats up trees, but there’s so much more. Some of the more unique entries are:
DOG WITH JAW TRAPPED IN MAGAZINE RACK
SWAN IN DISTRESS
FERRET STUCK IN LIFT SHAFT
And of course:
TWO DOGS IN TOILET ELDERLY LADY INVOLVED
Get spatial with sf
Points data
Each incident in our data is a point in space with an X and Y coordinate. It’s currently eastings and northings, but we want to transform it to latitude and longitude.
‘sf’ class and reprojection
A Coordinate Reference System (CRS) is required to project geographic entities – points, polygons, etc – onto a surface. There are many systems for doing this, from local to global, each with its own CRS code. This isn’t a post about geography and projections, but you can read more in the CRS chapter of rspatial.org.
First we convert the our dataframe-class object an ‘sf’ class object using the st_as_sf
function, which readies it for spatial analysis. We simply provide arguments that point to the columns containing the coordinates and the CRS code for that projection system.
We can then use the st_transform()
function to convert our projection system from eastings and northings to to latitude and longitude.
library(sf)
rescue_sf <- rescue %>%
st_as_sf(
coords = c("Easting_rounded", "Northing_rounded"), # columns with coordinates
crs = 27700 # coordinate reference system code for eastings/northings
) %>%
st_transform(crs = 4326) # the coord ref system code for latlong
print(rescue_sf)
## Simple feature collection with 5352 features and 23 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -0.5106219 ymin: 51.29753 xmax: 0.3017003 ymax: 51.68849
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 5,352 x 24
## IncidentNumber DateTimeOfCall CalYear FinYear TypeOfIncident
## <dbl> <dttm> <dbl> <chr> <chr>
## 1 139091 2001-01-20 09:03:01 2009 2008/09 Special Servi…
## 2 275091 2001-01-20 09:08:51 2009 2008/09 Special Servi…
## 3 2075091 2004-01-20 09:10:07 2009 2008/09 Special Servi…
## 4 2872091 2005-01-20 09:12:27 2009 2008/09 Special Servi…
## 5 3553091 2006-01-20 09:15:23 2009 2008/09 Special Servi…
## 6 3742091 2006-01-20 09:19:30 2009 2008/09 Special Servi…
## 7 4011091 2007-01-20 09:06:29 2009 2008/09 Special Servi…
## 8 4211091 2007-01-20 09:11:55 2009 2008/09 Special Servi…
## 9 4306091 2007-01-20 09:13:48 2009 2008/09 Special Servi…
## 10 4715091 2007-01-20 09:21:24 2009 2008/09 Special Servi…
## # … with 5,342 more rows, and 19 more variables: PumpCount <chr>,
## # PumpHoursTotal <chr>, IncidentNotionalCost <chr>,
## # FinalDescription <chr>, AnimalGroupParent <chr>, OriginofCall <chr>,
## # PropertyType <chr>, PropertyCategory <chr>,
## # SpecialServiceTypeCategory <chr>, SpecialServiceType <chr>,
## # WardCode <chr>, Ward <chr>, BoroughCode <chr>, Borough <chr>,
## # StnGroundName <chr>, PostcodeDistrict <chr>, Easting_m <chr>,
## # Northing_m <chr>, geometry <POINT [°]>
So what happened more specifically? Our eastings and northings were combined with st_as_sf()
into a two element list-column called geometry
with data type sfc_POINT
. Our new sf-class object also contains some metadata detailing the geometry type – POINTS
in our case – and the projection system of the coordinate data, which we converted to latlong with the st_transform()
function.
Tidyverse manipulation
As mentioned, one advantage of using sf
is that sf-class objects use the tidy data paradigm that allows for use of the tidyverse. Some users may prefer this relative to the ‘Spatial Points Data Frame’ (SPDF) class that is produced by the sp
package for points data. SPDFs are an S4 class, which means they have ‘slots’ of data, coordinates, etc.
In the code chunk above I used pipes to pass the rescue
dataframe to st_as_sf()
and then to st_transform()
. You can also use dplyr
functions like filter()
on your sf-class object.
filtered_rescue_sf <- rescue_sf %>%
filter(
AnimalGroupParent %in% c("Dog", "Cat", "Bird"), # just these animals
DateTimeOfCall > ymd("2017-01-01") &
DateTimeOfCall < ymd("2017-12-31") # 2017 only
)
We can also use the st_coordinates()
function to extract the XY (latitude and longitude) coordinates from the column containing our sf
point geometry. This means we can have separate columns for the latitude and longitude, as well as our list-column.
# extract coordinates into two-column dataframe
rescue_coords <- as.data.frame(st_coordinates(filtered_rescue_sf))
# bind these to rescue dataset
filtered_rescue_sf <- bind_cols(filtered_rescue_sf, rescue_coords)
# preview
head(filtered_rescue_sf)
## Simple feature collection with 6 features and 25 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -0.3653023 ymin: 51.47668 xmax: 0.1769739 ymax: 51.58195
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 26
## IncidentNumber DateTimeOfCall CalYear FinYear TypeOfIncident
## <dbl> <dttm> <dbl> <chr> <chr>
## 1 10299091 2017-01-20 09:11:40 2009 2008/09 Special Servi…
## 2 10470091 2017-01-20 09:18:08 2009 2008/09 Special Servi…
## 3 27805091 2017-02-20 09:13:23 2009 2008/09 Special Servi…
## 4 27898091 2017-02-20 09:16:13 2009 2008/09 Special Servi…
## 5 63575091 2017-04-20 09:16:12 2009 2009/10 Special Servi…
## 6 63759091 2017-04-20 09:22:04 2009 2009/10 Special Servi…
## # … with 21 more variables: PumpCount <chr>, PumpHoursTotal <chr>,
## # IncidentNotionalCost <chr>, FinalDescription <chr>,
## # AnimalGroupParent <chr>, OriginofCall <chr>, PropertyType <chr>,
## # PropertyCategory <chr>, SpecialServiceTypeCategory <chr>,
## # SpecialServiceType <chr>, WardCode <chr>, Ward <chr>,
## # BoroughCode <chr>, Borough <chr>, StnGroundName <chr>,
## # PostcodeDistrict <chr>, Easting_m <chr>, Northing_m <chr>,
## # geometry <POINT [°]>, X <dbl>, Y <dbl>
Polygons data
Read and convert GeoJSON
While we’re messing around with the sf
package we can investigate polygons by adding in the borders for each of the London boroughs from a GeoJSON file.
First, read Local Authority District (LADs) data directly from the Open Geography Portal and then use the st_as_sf()
function to make the conversion from the sp class to the sf class.
This means our polygon dataset is a tidy dataframe with the polygon information stored as MULTIPOLYGON
type in a list-column with as many elements as required to draw each polygon (e.g. a square requires four sets of XY points that can be joined together). The outcome is very similar to what we had for our points data.
library(geojsonio) # for working with geojson files
# read geojson from the web (local authorities)
lad_json <- geojson_read(
x = "https://opendata.arcgis.com/datasets/fab4feab211c4899b602ecfbfbc420a3_4.geojson",
what = "sp" # spatial class
)
# convert to sf
lad_sf <- st_as_sf(lad_json)
# preview
head(lad_sf)
## Simple feature collection with 6 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -2.83382 ymin: 53.30631 xmax: -0.7913148 ymax: 54.72728
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## objectid lad17cd lad17nm lad17nmw bng_e bng_n long
## 0 1 E06000001 Hartlepool 447157 531476 -1.27023
## 1 2 E06000002 Middlesbrough 451141 516887 -1.21099
## 2 3 E06000003 Redcar and Cleveland 464359 519597 -1.00611
## 3 4 E06000004 Stockton-on-Tees 444937 518183 -1.30669
## 4 5 E06000005 Darlington 428029 515649 -1.56835
## 5 6 E06000006 Halton 354246 382146 -2.68853
## lat st_areashape st_lengthshape geometry
## 0 54.67616 96586817 50245.93 MULTIPOLYGON (((-1.243848 5...
## 1 54.54467 54741668 35458.51 MULTIPOLYGON (((-1.200218 5...
## 2 54.56752 247140467 78666.80 MULTIPOLYGON (((-1.200218 5...
## 3 54.55691 206473805 86947.34 MULTIPOLYGON (((-1.193937 5...
## 4 54.53535 198298966 91341.12 MULTIPOLYGON (((-1.43994 54...
## 5 53.33424 80971982 59054.62 MULTIPOLYGON (((-2.695155 5...
Scrape London borough list
But which of these LADs are London boroughs? We can extract a vector of the boroughs from Wikipedia using the rvest
package and use it to filter our data. The CSS selector used in the html_nodes()
function below can be extracted using the very handy SelectorGadget tool.
library(rvest) # scraping webpages
library(xml2) # parsing XML
# read the webpage
wiki <- read_html("https://en.wikipedia.org/wiki/List_of_London_boroughs")
# get borough names
boroughs <- wiki %>%
html_nodes(css = "td:nth-child(1) > a") %>%
html_text() %>%
tolower()
# filter the sf for london boroughs
boroughs_sf <- lad_sf %>%
mutate(lad17nm = tolower(lad17nm)) %>%
filter(lad17nm %in% boroughs)
# preview
head(boroughs_sf)
## Simple feature collection with 6 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -0.3371469 ymin: 51.29222 xmax: 0.2197077 ymax: 51.67066
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## objectid lad17cd lad17nm lad17nmw bng_e bng_n long
## 1 297 E09000004 bexley 549202 175433 0.146212
## 2 294 E09000001 city of london 532382 181358 -0.093510
## 3 295 E09000002 barking and dagenham 547759 185107 0.129506
## 4 296 E09000003 barnet 523472 191753 -0.218210
## 5 298 E09000005 brent 519615 186468 -0.275680
## 6 299 E09000006 bromley 542036 165708 0.039246
## lat st_areashape st_lengthshape geometry
## 1 51.45821 62394091 37331.781 MULTIPOLYGON (((0.2197077 5...
## 2 51.51564 3081566 7498.317 MULTIPOLYGON (((-0.08177556...
## 3 51.54552 36017227 30423.228 MULTIPOLYGON (((0.1577779 5...
## 4 51.61108 86530595 41634.761 MULTIPOLYGON (((-0.1836922 ...
## 5 51.56441 43881295 32897.293 MULTIPOLYGON (((-0.2150751 ...
## 6 51.37267 148885903 63088.815 MULTIPOLYGON (((0.07374317 ...
Map it
So now we can plot the data. The addPolygons()
function accepts each MULTIPOLYGON
in the list-column of our London boroughs dataset. The addMarkers()
function accepts the POINTS-type list-column to plot each point at the latitude and longitude specified.
This is a very simple map for now. You can:
- zoom with the
+
and-
buttons or scroll with your mouse wheel - click and drag to move the map
- click the points to get a pop-up showing more information
- hover over a borough to highlight it and show a label
library(leaflet) # for interactive mapping
# put the map together
leaflet(boroughs_sf) %>%
addProviderTiles(providers$Wikimedia) %>%
addPolygons( # add London borough boundaries
label = ~tools::toTitleCase(lad17nm), # label on hover
color = "black", # boundary colour
weight = 2, # boundary thickness
opacity = 1, # fully opaque lines
fillOpacity = 0.2, # mostly transparent
highlightOptions = highlightOptions(
color = "white", # turn boundary white on hover
weight = 2, # same as polygon boundary
bringToFront = TRUE # overlay the highglight
)
) %>%
addAwesomeMarkers( # add incident points
lng = filtered_rescue_sf$X, lat = filtered_rescue_sf$Y,
icon = awesomeIcons(
library = "ion", # from this icon library
icon = "ion-android-alert", # use this icon
iconColor = "white", # colour it white
# colour by animal
markerColor = case_when( # different colours for each
filtered_rescue_sf$AnimalGroupParent == "Dog" ~ "red",
filtered_rescue_sf$AnimalGroupParent == "Cat" ~ "blue",
filtered_rescue_sf$AnimalGroupParent == "Bird" ~ "black"
)
),
popup = ~paste0( # display this information on click
"<b>Animal</b>: ", filtered_rescue_sf$AnimalGroupParent, "<br>",
"<b>Incident</b>: ", filtered_rescue_sf$FinalDescription, "<br>",
"<b>Date</b>: ", filtered_rescue_sf$DateTimeOfCall, "<br>",
"<b>Borough</b>: ", filtered_rescue_sf$Borough, "<br>",
"<b>Notional cost</b>: £", filtered_rescue_sf$IncidentNotionalCost
)
)
Okay cool, we get a simple map of London with borough boundaries and markers showing incidents in 2017, with a different colour for each of three selected animal groups (red = dog, blue = cat, black = bird).
And remember
My main advice? Keep an eye on your pets. And consider covering your chimney.
Further reading on sf
- Geocomputation with R by Robin Lovelace, Jakub Nowosad and Jannes Muenchow
- Introduction to GIS with R: Spatial data with the sp and sf packages and Geocoding with R: Using ggmap to geocode and map historical data by Jesse Sadler
- Spatial analysis pipelines with simple features in R from Kyle Walker Data
R session information
Session info
## [1] "Last updated 2019-04-15"
## R version 3.5.3 (2019-03-11)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] leaflet_2.0.2 rvest_0.3.2 xml2_1.2.0 geojsonio_0.6.0
## [5] sf_0.7-2 DT_0.4 lubridate_1.7.4 janitor_1.0.0
## [9] dplyr_0.8.0.1 readr_1.3.1 emo_0.0.0.9000
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 lattice_0.20-38 class_7.3-15 assertthat_0.2.0
## [5] digest_0.6.18 utf8_1.1.4 V8_1.5 mime_0.6
## [9] R6_2.4.0 evaluate_0.13 e1071_1.7-0 httr_1.4.0
## [13] geojson_0.2.0 blogdown_0.7 pillar_1.3.1 rlang_0.3.1
## [17] lazyeval_0.2.1 curl_3.3 rstudioapi_0.10 rmarkdown_1.11
## [21] rgdal_1.3-6 jqr_1.1.0 stringr_1.4.0 selectr_0.4-1
## [25] foreign_0.8-71 htmlwidgets_1.3 shiny_1.2.0 compiler_3.5.3
## [29] httpuv_1.4.5.1 xfun_0.5 pkgconfig_2.0.2 rgeos_0.3-28
## [33] htmltools_0.3.6 tidyselect_0.2.5 tibble_2.0.1 bookdown_0.7
## [37] fansi_0.4.0 crayon_1.3.4 later_0.8.0 grid_3.5.3
## [41] spData_0.2.9.4 jsonlite_1.6 xtable_1.8-3 DBI_1.0.0
## [45] magrittr_1.5 units_0.6-2 cli_1.0.1 stringi_1.3.1
## [49] promises_1.0.1 sp_1.3-1 tools_3.5.3 glue_1.3.0
## [53] purrr_0.3.1 hms_0.4.2 crosstalk_1.0.0 yaml_2.2.0
## [57] maptools_0.9-4 classInt_0.2-3 knitr_1.22