From: Eric Rescorla <ekr@rtfm.com>

Date: Mon, 9 Apr 2012 09:02:41 -0700

Message-ID: <CABcZeBO85+MuNshMYfF2qxU3ws7EiuHSY9Gvh0mUE7i7ot8=FQ@mail.gmail.com>

To: Marshall Eubanks <marshall.eubanks@gmail.com>

Cc: rtcweb@ietf.org, public-webrtc@w3.org

Date: Mon, 9 Apr 2012 09:02:41 -0700

Message-ID: <CABcZeBO85+MuNshMYfF2qxU3ws7EiuHSY9Gvh0mUE7i7ot8=FQ@mail.gmail.com>

To: Marshall Eubanks <marshall.eubanks@gmail.com>

Cc: rtcweb@ietf.org, public-webrtc@w3.org

On Mon, Apr 9, 2012 at 8:35 AM, Marshall Eubanks <marshall.eubanks@gmail.com> wrote: > I really like this analysis. Some questions. > > 2012/4/9 Eric Rescorla <ekr@rtfm.com>: >> Hi folks, >> >> Since it seems like we're going to be having a large number of >> interims, I thought it might be instructive to try to analyze a bunch >> of different locations to figure out the best strategy. My first cut >> analysis is below. >> >> Note that I'm not trying to make any claims about what the best set of >> venues is. It's obviously easy to figure out any statistic we want >> about each proposed venue, but how you map that data to "best" is up >> to you. In particular, there's some tradeoff between minimal total >> travel time and a "fair" distribution of travel times (not that I >> claim to know what that means). >> >> >> METHODOLOGY >> The data below is derived by treating both people and venues as >> airport locations and using travel time as our primary instrument. >> >> 1. For each responder for the current Doodle poll, assign a home >> airport based on their draft publication history. We're missing a >> few people but basically it should be pretty complete. Since >> these people responded before the venue is known, it's at >> least somewhat unbiased. >> >> 2. Compute the shortest advertised flight between each home airport >> and the locations for each venue by looking at the shortest >> advertised Kayak flights around one of the proposed interim >> dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares". >> [Thanks to Martin Thomson or helping me gather these.] >> > > 1.) Why are some fields doubled ? I.e., > > ARN SFO 14 13 > > Are these counted twice ? That would, of course, give more weight to > those records. Laziness. When I started recording flight times, I used the total time and then later realized that what I wanted was to break them out by out and back, but I was too lazy to go back and fix the earlier ones. > 2.) At any rate, I couldn't quite match your numbers. For SFO, for > example, I got > > # SFO > > Records 29 | > Mean 12.52 | > RMS 15.34 | > Std Dev 8.55 | > Minimum 1.00 | > Maximum 34.00 | > > This assumes that each doubled entry counts as 2 separate entries. If > the second entries are ignored, I get I'm not sure what procedure you are following here, but if it's taking the SD of the data in durations.txt, that's not what I did. That's just the input data. The summary data that I am showing is produced by weighting by participant from each home airport. The script to generate that is pairings.py and the results are found in doodle-out.txt. Of course, it could still all be wrong. FWIW, I'm using R's sd() which uses n-1. -Ekr > # SFO > > Records 21 | > Mean 14.05 | > RMS 17.05 | > Std Dev 9.14 | > Minimum 1.00 | > Maximum 34.00 | > > If two entries are averaged together (when present) > > # SFO > Records 21 | > Mean 13.93 | > RMS 16.97 | > Std Dev 9.18 | > Minimum 1.00 | > Maximum 34.00 | > > None of these 3 options match your > > Venue Mean Median SD > ---------------------------------------------- > SFO 13.5 11 12.2 > > In particular, your SD value seems high. > > (Note, I use the SD = root mean square /(n-1) not / n convention, but > that won't explain the difference. ) > > Regards > Marshall > > >> This lets us compute statistics for any venue and/or combination >> of venues, based on the candidate attendee list. >> >> The three proposed venues: >> >> - San Francisco (SFO) >> - Boston (BOS) >> - Stockholm (ARN) >> >> Three hubs not too distant from the proposed venues: >> >> - London (LHR) >> - Frankfurt (FRA) >> - New York (NYC) [0] >> >> Also, Calgary (YYC), since the other two chair locations (BOS and SFO) >> were already proposed as venues, and I didn't want Cullen to feel >> left out. >> >> >> RESULTS >> Here are the results for each of the above venues, measured in total >> hours of travel (i.e., round trip). >> >> Venue Mean Median SD >> ---------------------------------------------- >> SFO 13.5 11 12.2 >> BOS 12.3 11 7.5 >> ARN 17.0 21 10.7 >> FRA 14.8 17 7.3 >> LHR 13.3 14 7.5 >> NYC 11.5 11 5.8 >> YYC 14.9 13 10.2 >> SFO/BOS/ARN 14.3 13 3.6 >> SFO/NYC/LHR 12.7 11.3 3.7 >> >> XXX/YYY/ZZZ a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean >> and median are intended to be some sort of aggregate measure of travel >> time. I don't have any way to measure "fairness", but SD is intended >> as some metric of the variation in travel time between attendees. >> >> The raw data and software are attached. The files are: >> >> home-airports -- the list of people's home airports >> durations.txt -- the list of airport-airport durations >> doodle.txt -- the attendees list >> pairings.py -- the software to compute travel times >> doodle-out.txt -- the computed travel times for each attendee >> >> Obviously, there could be an error in the raw data or the software. >> Please feel free to send corrections, especially if you find >> something material. >> >> >> OBSERVATIONS >> Obviously, it's hard to know what the optimal solution is without >> some model for optimality, but we can still make some observations >> based on this data: >> >> 1. If we're just concerned with minimizing total travel time, then we >> would always in New York, since it has both the shortest mean travel >> time and the shortest median travel time, but as I said above, this >> arguably isn't fair to people who live either in Europe or California, >> since they always have to travel. >> >> 2. Combining West Coast, East Coast, and European venues has >> comparable (or at least not too much worse) mean/median values than >> NYC with much lower SDs. So, arguably that kind of mix is more fair. >> >> 3. There's a pretty substantial difference between hub and non-hub >> venues. In particular, LHR has a median travel time 7 hours less than >> ARN, and the SFO/NYC/LHR combination has a median/mean travel time >> about 2 hours less than SFO/BOS/ARN (primarily accounted for by the >> LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs >> here, but you'd probably get similar results if, for instance, you >> used AMS instead of LHR.] >> >> >> Obviously, your mileage may vary based on your location and feelings >> about what's fair, but based on this data, it looks to me like a >> three-way rotation between West Coast, East Coast, and European hubs >> offers a good compromise between minimum cost and a flat distribution >> of travel times. >> >> Personally, whatever we decide to do I'd ask that the WG settle now on >> a pattern going forward so that we can predictably budget our travel >> time and dollars. >> >> >> [0] Treating all three NYC airports as a single location. >> >> _______________________________________________ >> rtcweb mailing list >> rtcweb@ietf.org >> https://www.ietf.org/mailman/listinfo/rtcweb >>Received on Monday, 9 April 2012 16:03:50 GMT

*
This archive was generated by hypermail 2.2.0+W3C-0.50
: Monday, 9 April 2012 16:03:50 GMT
*