As part of the transit signal priority (TSP) evaluation application for LADOT, I needed to be able to match two different data sets by time. Specifically, I wanted to match LAFD EMS runs with sequences of TSP activation along the target corridors.
In my initial build of the application, I was matching only one EMS
transponder to one specific TSP loop tag ID. Therefore, to make this
match, I only needed to look for overlapping time between the two
different ‘runs’; if the time windows between those two overlapped at
all, for any length of time, I considered it a match. I found the
perfect R package, fuzzyjoin,
with the function
interval_join
designed to do just that.
However, as the evaluation expanded to multiple vehicles, each with a
separate corresponding TSP tag ID, I faced a challenge: how can I match by
not only overlapping time, but also by an appropriate match ID? I didn’t
have to look far; fuzzyjoin
also contains a function called
genome_join
. Why genomes? Apparently “Genomic intervals include both a
chromosome ID and an interval: items are only considered matching if the
chromosome ID matches and the interval overlaps.” Apparently geneticists have this same problem.
This function ended up suiting my purposes perfectly. Instead of
matching by chromosome ID, I was using the matching tag ID for the TSP.
The only hitch was that I needed to first convert POSIXct
values to
numeric values for the join (the underlying findOverlaps
method within
the IRanges functionality requires numeric values) and then convert them
back to date/time values after the join.
Beautiful!