hotspotr: hotspot mapping in R
08 Jul 2014Updated 6/17/2016: A new version of hotspotr is available for download here. The new version is a standalone program with a number of improvements and better performance than the R package discussed here, which still works but is no longer maintained.
A very alpha version of a package I put together to do hotspot mapping in R is now available on github. All you to get started is a a dataset with x,y coordinates and markers for case vs. control status (i.e. home locations of diseased vs. non-diseased individuals).
Although there are already great packages out there to do this kind of thing (i.e. sparr and spatstat), I put this together to make it easy to compare between different algorithms and also to facilitate plotting the hotspot map on top of a geographic map using the ggmap
package.
The following is a short demo of how the package can be used to create a very simple hotspot map using the algorithm of your choosing to calculate local case densities (i.e., distance-based mapping, kernel density estimation, etc.). In the next few weeks, I hope to post a tutorial or two talking about these methods and potential applications in more depth.
Demo
To install, make sure you have the devtools
package installed and loaded and run:
Import hotspotr
:
Generate a set of (x,y) points in the unit square:
Using the random_hotspot
function in hotspotr
, place an area of increased risk in the center of the square. In this case, we’ll select an area covering the middle 30% of the unit square where 80% of individuals within this are are cases and only 20% outside of it are cases, for a relative risk of 4:
Create a new data frame with case points labeled as z = 1 and controls as z = 0:
We can plot this and see anecdotally that there is a greater density of cases (represented by triangles) in the center:
We can verify whether this area of increased density is statistically significant using the hotspot_map
function in hotspotr
.
The first argument to hotspot_map
is the data frame with columns x
, y
, and z
with x and y coordinates and case/control designations, respectively. The second argument is the density estimation method to use (currently only the distance-based-mapping method of Jeffery et al. is supported (as the function dbm_score_rr
).
User-defined density functions can easily be written. All that is required is a function of the form fn(hs)
that returns a density measure at each (x,y) point in the hs
dataframe. p
specifies the width of the smoothing window, and color_samples
specifies the number of random permutations of case/control designations to use when generating the color scale for the map:
We can then plot the resulting map and see that in fact the area at the center of the map represents a likely hotspot: