Re: [rtcweb] Data on travel times

From: Marshall Eubanks <marshall.eubanks@gmail.com> · Date: Mon, 9 Apr 2012 11:35:59 -0400

I really like this analysis. Some questions.

2012/4/9 Eric Rescorla <ekr@rtfm.com>:
> Hi folks,
>
> Since it seems like we're going to be having a large number of
> interims, I thought it might be instructive to try to analyze a bunch
> of different locations to figure out the best strategy. My first cut
> analysis is below.
>
> Note that I'm not trying to make any claims about what the best set of
> venues is. It's obviously easy to figure out any statistic we want
> about each proposed venue, but how you map that data to "best" is up
> to you. In particular, there's some tradeoff between minimal total
> travel time and a "fair" distribution of travel times (not that I
> claim to know what that means).
>
>
> METHODOLOGY
> The data below is derived by treating both people and venues as
> airport locations and using travel time as our primary instrument.
>
> 1. For each responder for the current Doodle poll, assign a home
>   airport based on their draft publication history.  We're missing a
>   few people but basically it should be pretty complete. Since
>   these people responded before the venue is known, it's at
>   least somewhat unbiased.
>
> 2. Compute the shortest advertised flight between each home airport
>   and the locations for each venue by looking at the shortest
>   advertised Kayak flights around one of the proposed interim
>   dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares".
>   [Thanks to Martin Thomson or helping me gather these.]
>

1.) Why are some fields doubled ? I.e.,

ARN SFO 14 13

Are these counted twice ? That would, of course, give more weight to
those records.

2.) At any rate, I couldn't quite match your numbers. For SFO, for
example, I got

# SFO

 Records            29  |
 Mean            12.52  |
 RMS             15.34  |
 Std Dev          8.55  |
 Minimum          1.00  |
 Maximum         34.00  |

This assumes that each doubled entry counts as 2 separate entries. If
the second entries are ignored, I get

# SFO

 Records            21  |
 Mean            14.05  |
 RMS             17.05  |
 Std Dev          9.14  |
 Minimum          1.00  |
 Maximum         34.00  |

If two entries are averaged together (when present)

# SFO
 Records            21  |
 Mean            13.93  |
 RMS             16.97  |
 Std Dev          9.18  |
 Minimum          1.00  |
 Maximum         34.00  |

None of these 3 options match your

Venue         Mean         Median           SD
----------------------------------------------
SFO           13.5             11         12.2

In particular, your SD value seems high.

(Note, I use the SD = root mean square /(n-1) not / n convention, but
that won't explain the difference. )

Regards
Marshall

> This lets us compute statistics for any venue and/or combination
> of venues, based on the candidate attendee list.
>
> The three proposed venues:
>
> - San Francisco (SFO)
> - Boston (BOS)
> - Stockholm (ARN)
>
> Three hubs not too distant from the proposed venues:
>
> - London (LHR)
> - Frankfurt (FRA)
> - New York (NYC) [0]
>
> Also, Calgary (YYC), since the other two chair locations (BOS and SFO)
> were already proposed as venues, and I didn't want Cullen to feel
> left out.
>
>
> RESULTS
> Here are the results for each of the above venues, measured in total
> hours of travel (i.e., round trip).
>
> Venue         Mean         Median           SD
> ----------------------------------------------
> SFO           13.5             11         12.2
> BOS           12.3             11          7.5
> ARN           17.0             21         10.7
> FRA           14.8             17          7.3
> LHR           13.3             14          7.5
> NYC           11.5             11          5.8
> YYC           14.9             13         10.2
> SFO/BOS/ARN   14.3             13          3.6
> SFO/NYC/LHR   12.7             11.3        3.7
>
> XXX/YYY/ZZZ a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean
> and median are intended to be some sort of aggregate measure of travel
> time. I don't have any way to measure "fairness", but SD is intended
> as some metric of the variation in travel time between attendees.
>
> The raw data and software are attached. The files are:
>
>  home-airports     -- the list of people's home airports
>  durations.txt     -- the list of airport-airport durations
>  doodle.txt        -- the attendees list
>  pairings.py       -- the software to compute travel times
>  doodle-out.txt -- the computed travel times for each attendee
>
> Obviously, there could be an error in the raw data or the software.
> Please feel free to send corrections, especially if you find
> something material.
>
>
> OBSERVATIONS
> Obviously, it's hard to know what the optimal solution is without
> some model for optimality, but we can still make some observations
> based on this data:
>
> 1. If we're just concerned with minimizing total travel time, then we
> would always in New York, since it has both the shortest mean travel
> time and the shortest median travel time, but as I said above, this
> arguably isn't fair to people who live either in Europe or California,
> since they always have to travel.
>
> 2. Combining West Coast, East Coast, and European venues has
> comparable (or at least not too much worse) mean/median values than
> NYC with much lower SDs. So, arguably that kind of mix is more fair.
>
> 3. There's a pretty substantial difference between hub and non-hub
> venues. In particular, LHR has a median travel time 7 hours less than
> ARN, and the SFO/NYC/LHR combination has a median/mean travel time
> about 2 hours less than SFO/BOS/ARN (primarily accounted for by the
> LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs
> here, but you'd probably get similar results if, for instance, you
> used AMS instead of LHR.]
>
>
> Obviously, your mileage may vary based on your location and feelings
> about what's fair, but based on this data, it looks to me like a
> three-way rotation between West Coast, East Coast, and European hubs
> offers a good compromise between minimum cost and a flat distribution
> of travel times.
>
> Personally, whatever we decide to do I'd ask that the WG settle now on
> a pattern going forward so that we can predictably budget our travel
> time and dollars.
>
>
> [0] Treating all three NYC airports as a single location.
>
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb
>