- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Tue, 16 Oct 2012 14:36:59 +0900
- To: Robin Berjon <robin@w3.org>
- CC: Ted Hardie <ted.ietf@gmail.com>, Anne van Kesteren <annevk@annevk.nl>, Larry Masinter <masinter@adobe.com>, "plh@w3.org" <plh@w3.org>, "Peter Saint-Andre (stpeter@stpeter.im)" <stpeter@stpeter.im>, "Pete Resnick (presnick@qualcomm.com)" <presnick@qualcomm.com>, "www-archive@w3.org" <www-archive@w3.org>, "Michael(tm) Smith" <mike@w3.org>
On 2012/10/16 1:30, Robin Berjon wrote: > On 15/10/2012 17:49 , Ted Hardie wrote: >> On Mon, Oct 15, 2012 at 8:07 AM, Robin Berjon <robin@w3.org> wrote: >>> URLs to non-Web things (e.g. mailto:, smsto:, tel:, etc.) happen in Web >>> contexts. Libraries written to process those in Web contexts are >>> likely to >>> be reused elsewhere. There isn't really an option to have some of >>> this in >>> Web use cases and something else outside of it. If it's used for the >>> Web, it >>> *will* leak. Probably a lot, and probably fast. One first question is how much we want it to leak. An example that Anne brought up is a URL with a space character in it. It is clear that these things exist on the Web, in not too small numbers. On the other hand, it's also clear that there are many places (some of them defined by specs, some of them just somewhere in scripts and the like) that will just 'blow up' when they get a space. Do we want to make sure that all browsers treat such a space in the same way? Most probably yes, and in this case, maybe they already do. Does it make sense to write that down? I'd also very much say yes. Do we want to make sure that all other places that accept URIs or IRIs also accept a space and treat it the same? Maybe we would like to do so, but is it possible? Quite clearly no (just think HTTP request header). This essentially means that the fork is already here. In some sense, that's really bad news. But if we look more closely, the news may not be that bad. First, at least for the case with the space in it, we know how to convert it to an equivalent without a space: use %20 (except maybe in form parts). But we need to make sure that this is written down somewhere. Second, and that will be more obvious for some more esoteric cases than just a space, I think that even among those who agree that such cases should be described, and should be handled uniformly by browsers, there will be quite some agreement that it's better not to produce such things. What we end up with is something I'd call a semi-fork, which is a subset of "recommended" URIs/IRIs within a larger set of (sometimes, but not always) tolerated ones. We already have this for the XML case, it's called LEIRIs (http://tools.ietf.org/html/draft-ietf-iri-3987bis-12#section-6). At one point, we tried to do something similar to what Anne is now trying to address, but we did not get very far because once one goes beyond the simple cases (such as a space), it gets messy quite quickly (read: different browsers do different things). Even though there are representatives of all major browser vendors subscribed to the IRI WG mailing list, we also didn't get much in terms of contributions or feedback (Adam and Anne occasionally were exceptions). >> I agree. But that argues that an xmpp URI seen in a jabber context >> and an xmpp URI seen in a web context should be the same; Syntactically correct xmpp URIs should be the same indeed, and I think they currently are. >> or, to >> re-iterate, that a fork would be harmful. Changing the URI parsing in >> web contexts only is likely to be problematic because of leakage. >> Avoiding that by retaining one way is my personal preference for the >> way forward. But if those working on web-specific specs do not agree >> and choose to fork, then we *must* mark the difference between the >> contexts, or the results will be even worse. > > I think that we're in ruthlessly violent agreement here :) > > At this point we have to look at what status Anne's work could be > published under. It doesn't have to be a fork, it could simply be > published as The One True Way to parse URLs (after reviews, etc. > obviously). Is that something that could be acceptable? I think it can easily by the One True Way to parse URLs in Web Browsers. Given some of the current differences between browsers, even that may be though, but I very much hope that Anne can be successful. I think that in a way similar to how the HTML5 spec currently distinguishes between an authoring version and a parsing version, Anne's document can be the parsing version for Web browsers, and RFC 3986, and 3987bis, can be the authoring version(s). Of course, that's not a strict parallel. As an example, Anne plans to clearly document/spec how URL equivalence works in JavaScript. For everybody who uses JavaScript, this will clearly be a good thing. However, as http://tools.ietf.org/html/rfc3986#section-6, http://tools.ietf.org/html/rfc3987#section-5, and http://tools.ietf.org/html/draft-ietf-iri-comparison-01 should make quite clear, how to compare URIs/IRIs/URLs depends very much on the application. On one end, a spider will make as many shortcurts as possible, where on the other end, XML namespaces and RDF will do codepoint-by-codepoint comparison, and there is clearly some value in documenting that. (Also, an extended JavaScript library may provide quite a few variants to deal with these application needs.) Last but not least, I would like to mention that if there's anything that we can reasonably do to make the gap in the semifork narrower, then we should give it a try. Two examples: First, RFC 3987 was quite strict about character normalization in some circumstances. It has turned out that browsers did it differently, so we changed the spec. Also, we had to find out that query parts don't get converted using UTF-8 as often as we would like. So we also adapted the spec, even though that's still under discussion. If there are other cases that we *can* address, please tell us. On the other hand, I'd hope that with the work that Anne does, he also tries to narrow the gap where possible, e.g. by choosing a solution closer to RFC 3986/3987bis where browsers disagree. Regards, Martin.
Received on Tuesday, 16 October 2012 05:37:33 UTC