- From: Adam Barth <w3c@adambarth.com>
- Date: Wed, 28 Apr 2010 08:40:49 -0700
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: HTML WG <public-html@w3.org>, Larry Masinter <LMM@acm.org>
On Wed, Apr 28, 2010 at 7:59 AM, Julian Reschke <julian.reschke@gmx.de> wrote: > On 23.04.2010 00:03, Adam Barth wrote: >> I haven't been paying that close attention to all the machinations >> around URL parsing in this working group, but I've been looking into >> URL parsing a bit recently. In case it's useful to this working group >> (or the IETF's URL working group), I've attached some raw data on how >> various browsers parse URLs. These tests are from this test suite: >> >> http://trac.webkit.org/browser/trunk/LayoutTests/fast/url >> >> which is adapted from these unit tests: >> >> http://code.google.com/p/google-url/source/browse/trunk/src/url_canon_unittest.cc >> >> I might send a summary of my findings after I analyze the data. > > very interesting. > > Here's a question; picking a random test case; scheme name normalization: > > PASS canonicalize('HTTP://example.com/') is 'http://example.com/' > > Could you explain based on the HTML5 spec (in doubt an earlier version which > doesn't yet rely on the IRI spec) why it's expected that the scheme name > get's lowercased? Oh, as I said above, this is "raw data." The "expected" results are just what the author of url_canon_unittest.cc thought the results should be. This data is purely an empirical measurement of what browsers actually do. In the case you mention, my recollection is that 3 out of 4 browsers agree that you should lowercase the scheme. Based on that evidence, I'd probably recommend that the wayward browser also lowercase the scheme. However, I've haven't looked into these issues in enough detail to know if there are other considerations that might cause us to prefer that browsers not lowercase the scheme. Adam
Received on Wednesday, 28 April 2010 15:42:07 UTC