Data on travel times

Hi folks,

Since it seems like we're going to be having a large number of
interims, I thought it might be instructive to try to analyze a bunch
of different locations to figure out the best strategy. My first cut
analysis is below.

Note that I'm not trying to make any claims about what the best set of
venues is. It's obviously easy to figure out any statistic we want
about each proposed venue, but how you map that data to "best" is up
to you. In particular, there's some tradeoff between minimal total
travel time and a "fair" distribution of travel times (not that I
claim to know what that means).


METHODOLOGY
The data below is derived by treating both people and venues as
airport locations and using travel time as our primary instrument.

1. For each responder for the current Doodle poll, assign a home
   airport based on their draft publication history.  We're missing a
   few people but basically it should be pretty complete. Since
   these people responded before the venue is known, it's at
   least somewhat unbiased.

2. Compute the shortest advertised flight between each home airport
   and the locations for each venue by looking at the shortest
   advertised Kayak flights around one of the proposed interim
   dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares".
   [Thanks to Martin Thomson or helping me gather these.]

This lets us compute statistics for any venue and/or combination
of venues, based on the candidate attendee list.

The three proposed venues:

- San Francisco (SFO)
- Boston (BOS)
- Stockholm (ARN)

Three hubs not too distant from the proposed venues:

- London (LHR)
- Frankfurt (FRA)
- New York (NYC) [0]

Also, Calgary (YYC), since the other two chair locations (BOS and SFO)
were already proposed as venues, and I didn't want Cullen to feel
left out.


RESULTS
Here are the results for each of the above venues, measured in total
hours of travel (i.e., round trip).

Venue         Mean         Median           SD
----------------------------------------------
SFO           13.5             11         12.2
BOS           12.3             11          7.5
ARN           17.0             21         10.7
FRA           14.8             17          7.3
LHR           13.3             14          7.5
NYC           11.5             11          5.8
YYC           14.9             13         10.2
SFO/BOS/ARN   14.3             13          3.6
SFO/NYC/LHR   12.7             11.3        3.7

XXX/YYY/ZZZ a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean
and median are intended to be some sort of aggregate measure of travel
time. I don't have any way to measure "fairness", but SD is intended
as some metric of the variation in travel time between attendees.

The raw data and software are attached. The files are:

  home-airports     -- the list of people's home airports
  durations.txt     -- the list of airport-airport durations
  doodle.txt        -- the attendees list
  pairings.py       -- the software to compute travel times
  doodle-out.txt -- the computed travel times for each attendee

Obviously, there could be an error in the raw data or the software.
Please feel free to send corrections, especially if you find
something material.


OBSERVATIONS
Obviously, it's hard to know what the optimal solution is without
some model for optimality, but we can still make some observations
based on this data:

1. If we're just concerned with minimizing total travel time, then we
would always in New York, since it has both the shortest mean travel
time and the shortest median travel time, but as I said above, this
arguably isn't fair to people who live either in Europe or California,
since they always have to travel.

2. Combining West Coast, East Coast, and European venues has
comparable (or at least not too much worse) mean/median values than
NYC with much lower SDs. So, arguably that kind of mix is more fair.

3. There's a pretty substantial difference between hub and non-hub
venues. In particular, LHR has a median travel time 7 hours less than
ARN, and the SFO/NYC/LHR combination has a median/mean travel time
about 2 hours less than SFO/BOS/ARN (primarily accounted for by the
LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs
here, but you'd probably get similar results if, for instance, you
used AMS instead of LHR.]


Obviously, your mileage may vary based on your location and feelings
about what's fair, but based on this data, it looks to me like a
three-way rotation between West Coast, East Coast, and European hubs
offers a good compromise between minimum cost and a flat distribution
of travel times.

Personally, whatever we decide to do I'd ask that the WG settle now on
a pattern going forward so that we can predictably budget our travel
time and dollars.


[0] Treating all three NYC airports as a single location.

Received on Monday, 9 April 2012 13:58:51 UTC