- From: Anne van Kesteren <annevk@annevk.nl>
- Date: Thu, 8 Nov 2012 09:44:54 +0100
- To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Cc: David Sheets <kosmo.zb@gmail.com>, Ian Hickson <ian@hixie.ch>, "Manger, James H" <James.H.Manger@team.telstra.com>, Christophe Lauret <clauret@weborganic.com>, Jan Algermissen <jan.algermissen@nordsc.com>, Ted Hardie <ted.ietf@gmail.com>, URI <uri@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
On Thu, Nov 8, 2012 at 5:16 AM, "Martin J. Dürst" <duerst@it.aoyama.ac.jp> wrote: > Sorry to be late with my reply. No worries! > On 2012/11/06 0:20, Anne van Kesteren wrote: >> On Mon, Nov 5, 2012 at 12:19 PM, "Martin J. Dürst" >> <duerst@it.aoyama.ac.jp> wrote: >> Given the way strings in browsers are really 16-bit code units >> (Mozilla's Rust might change that, I hear) with no restrictions I >> doubt that's a problem. And given that the input to the URL parser can >> certainly contain one of those code points you have to handle them >> somehow. > > Yes. But that also applies to a space, very obviously (Web pages without > spaces would be really bad, except potentially in Chinese, Japanese, > Thai,...:-), but still these are not part of valid URLs. My current view is that it mostly makes sense to restrict certain code points in the ASCII range as those are used as delimiters throughout the ecosystem. HTML/Python use quotation marks, HTTP uses the colon and whitespace, etc. So by putting the restrictions there, you make it easy to copy and paste a URL around. > While we are at it, could you go through the list in the LEIRI section > (http://tools.ietf.org/html/draft-ietf-iri-3987bis-13#section-6.3) as an > easy way to cross-check whether there are any other differences? So LEIRIs are an even larger superset of IRIs. "\" seems problematic as passing that to a URL parser results in it being handled as if it were a "/". (I suppose we could make the parser handle that via a flag, or before handing it to the parser you replace "\" with "%5C".) U+0009, U+000A, and U+000D are pretty much always dropped on the floor by a URL parser so those would be problematic too. I am surprised [ and ] are not allowed. mailto:a@b?subject=[test]%20 is something I semi-frequently write and where I keep forgetting I need to escape [ and ] to make it valid (I never had it fail anything but the validator though). >> I don't really have an opinion on this. I can certainly assist filing >> bugs on implementors, but I doubt they are interested in taking this >> potential compatibility hit (if I understand correctly what you're >> proposing). > > Only scammers should have any reason to use these. It's way more a security > issue (in which browsers often show a very strong interest) than a > compatibility issue. I'll try to follow up on this in a separate mail, but > that may not be this week, sorry. What would be interesting is affected code points, and expected results. There's a few cases currently where the URL parser has a hard fail. E.g. if you resolve "/test" against "about:blank". We could expand that to include these code points I suppose, but it seems like a major risk. -- http://annevankesteren.nl/
Received on Thursday, 8 November 2012 08:45:28 UTC