Spatial data are pervasive in ecology, especially for wild systems. In the past, ecological analyses that hinged on space were usually performed in ArcGIS. A LOT of species distribution modeling and habitat modeling utilized Arc functionality, thanks in great part to Hawth’s tools and other toolkits expressly aimed at ecological questions.
As it turns out, though, Arc is both expensive (at least relative to my grad student income), and limited in statistical capacity. Each generation of ecological models incorporates more mechanism, more heterogeneity, and hence more complexity. Dealing with those complexities in Arc is, frankly, painful. Besides, we’re all using R anyway… why can’t we just deal with spatially referenced data there?
As far as I can tell, ecologists rely on Arc for three basic tasks:
- Specifying / aligning geographic metadata, and reprojecting layers
- “Drilling down” to integrate data from multiple layers
- Spatial data visualization.
In this post, I’ll go through ways to accomplish each of these tasks in R. I’ll start with a little GIS background, introduce GIS in R, and then delve into each of these topics specifically. I’m a relatively novice to GIS (although not to R), so please post feedback and extensions (and criticisms, if you must)!
Introduction: What’s a GIS?
A GIS is much like any other relational data structure, except that its index variable references a point in space (and sometimes also time). People talk about GISs as being comprised of stacks of “layers”. These are essentially the same as a set of two-dimensional spreadsheets, linked by a common field that contains spatial (or spatiotemporal) information.
Metadata is really, REALLY important for GIS, because “spatial locations” can be specified in many different ways. In order to properly align different layers, the user has to know the projection (remember those map projections we learned about in middle school geography?) and coordinate reference system (CRS) associated with each layer, and be sure that they match across all layers. If the projection and CRS do not match from layer to layer, some layers will need to be reprojected.
- Projections describe the way in which the (spherical) world gets projected onto a (planar) map. Projections from spheres to planes always contain distortions, but you can choose the projection to minimize distortion for attributes of the map that you care about (e.g., area, orientation, shape, etc.).
- Coordinate reference systems (CRSs) are different ways of writing down spatial locations. The best-known is probably Longitude/Latitude; in ecology Universal Transverse Mecrator (UTM) coordinates also get used a lot. Coordinate reference systems are cataloged in the EPSG registry, where every CRS is assigned a unique identifier.
GIS information is stored in two broad file classes: vectors (continuous in space; categorical in variable value) and rasters (categorical in space, continuous in variable value). The contents of vector files can be further compartmentalized into points, lines, and polygons. Since the preponderance of my own experience is with vectorized data, I’ll focus primarily on vectors here.
There several kinds of GIS storage objects. The two I know best are shapefiles and KML files.
- Shapefiles are the brainchild of ESRI, ArcGIS’s parent company. They are actually batches of files that include different pieces of information pertaining to a given layer. Each piece has its own extension. These include .shp, .dbf, .proj, and others. To get all the projection, coordinate references, and attached data, whatever programming you’re using needs to access ALL these files in tandem, not just the .shp file.
- KML files were developed for use with google Earth. They are a geographic extension of XML, which is a broadly used datastructure that organizes information on the web. I won’t go into KML files in depth, but you can access them using the getKMLcoordinates() function in the maptools package.
Spatial data in R
R currently has a bazillion spatial packages — for a nice summary of all the functionality that’s already out there, check out CRAN’s Spatial Task View. Looks to me like the work-horse packages that underlie most of the gadgets-and-gizmos spat-temp utility are:
- rgdal —— brings GDAL/OGR functionality to R
- spatial —- defines key spatial object classes
- sp ——— transformation and drilling functionality
- maptools – gets a lot of use, especially readShapePoly (but see my comments on that below)
- rgeos —- interface engine connecting R with GEOS, a C API with lots of nice topological functionality
Probably the most broadly-used class used spatial information in R is through the SpatialPointsDataFrame (“SPDF”) class implemented in sp. SPDFs are S4 objects (with a few S3 functionalities). They are very similar to the vanilla dataframes that we all know and love, except that they have a bunch of leading slots for projection and coordinate reference system information.
Specifying geographic metadata, and reprojecting layers
If you’re reading ESRI (e.g., ArcGIS) shapefiles into R, use the readOGR function in the rgdal package. Maptools can also read shapefiles with its readShapePoly (and similar) functions, but those functions don’t being projection information along, they just read the .shp file. Here’s an example of how to read in a shapefile with readOGR.
require(rgdal) MyShp <- readOGR(dsn = "<Path to files>", # Path to all derivates layer = "<Layer name>") # Name until ".shp"
We can check the projection of MyShp by using the proj4string() function in sp:
To convert a standard data frame (like you’d read in with read.csv) to a SPDF, all you have to do is specify what fields contain coordinates (using the coordinates function in sp). Note that ORDER MATTERS to coordinates(). You must specify the data as Longitude + Latitude (not Lat + Long).
MyData <- read.csv("<Path to files>", header = T) # if coordinates are in MyData$Longitude and MyData$Latitude: coordinates(MyData) <- ~Longitude + Latitude
By using the coordinates() function, the MyData object will be updated to an SPDF. It will retain all its original information, which will now be in its Data slot (accessed through calling MyData@data). To specify the projection of MyData, use the proj4string() and CRS() functions from sp. I’ve stuck strings specifying EPSGs for lat-long and google projections into named objects that I then pass along to CRS().
# EPSGs for some commonly-used projections googleproj <- "+init=epsg:3857" latlong <- "+init=epsg:4326" proj4string(MyData) <- CRS(latlong)
SPDFs can be easily reprojected to a different CRS using the spTransform function in sp. Here’s how we’d reproject MyData to be on the google coordinate system:
MyData.google <- spTransform(MyData, CRS(googleproj))
“Drilling down” to integrate data from multiple layers
The over() (or %over% if you’re into binary) function in sp aligns two layers and extracts the values from one layer that occur at the spatial locations referenced in the other. For example, if I had a set of points living in a SpatialPointsDataFrame and a set of polygons in a SpatialPolygonsDataFrame, I could use %over% to get the polygon-level values corresponding to each point.
MyShpPoints <- readOGR(dsn = "<Path to points>", layer = "<Pt layer name>) MyShpPolys <- readOGR(dsn = "<Path to polys>", layer = "<Poly layer name>) # get values form the "Name" field in MyShpPolys # corresponding to the location of each point in MyShpPoints PtPolyNames <- (MyShpPoints %over% MyShpPolys)$Name
We can get the area of polygon overlap for two polygons (like you might need for calculating homerange overlap for two animals, for example) by using the gIntersection() function in the rgeos package. For two polygons in the same SPDF, this might look like this:
MyShpPolys <- readOGR(dsn = "<Path to polys>", layer = "<Poly layer name >) # Extract geometries of the two focal polygons Poly1 <- MyShpPolys$Polygons[] # gets 1st polygon from MyShpPolys Poly2 <- MyShpPolys$Polygons[] # gets 2nd polygon from MyShpPolys require(rgeos) Poly1Poly2Overlap <- gIntersection(Poly1, Poly2, byid = F)
Spatial data visualization
require(RgoogleMaps) # these coordinates point to a field site near Hells Canyon, WA. centerIn <- c(46.264, -117.320) # specify magnification (I do this with guess-and-check) googleZoom <- 11 # specify map type (terrain, roadmap, satellite, etc.) googleMaptype <- "terrain" hc.map.image <- GetMap(center = centerIn, zoom = googleZoom, maptype = googleMaptype) # to view the map: PlotOnStaticMap(hc.map.image)
In order to overlap your own spatial data onto this map, first reproject the data so that it matches the tile with RgoogleMaps’ LatLon2XY.centered() function. Then, add the reprojected data to the maptile with standard plotting functions like points():
# Check that MyShpPts is in Lat/Lon and reproject if necessary projection(MyShpPts) MyShpPtsGoogle <- LatLon2XY.centered(hc.map.image, # tile name MyShpPts$Latitude, MyShpPts$Longitude, zoom = googleZoom) PlotOnStaticMap(hc.map.image) points(MyShpPtsGoogle$newX, MyShpPtsGoogle$newY, pch = 16, cex = 0.8)
Dynamic / Interactive maps
Some useful links and packages
There are lots of starter sheets for people working at the R/GIS interface. Here are a few I found particularly useful.
A few other packages that may be useful to ecologists:
adehabitatHR (for homerange estimation)
adehabitatLT (for animal movement analysis)
geosphere (for buffering and hard spherical geometry problems)