- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Tue, 8 Jul 2008 15:12:45 -0700
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Henrik Nordstrom <henrik@henriknordstrom.net>, Justin James <j_james@mindspring.com>, 'HTTP Working Group' <ietf-http-wg@w3.org>, public-html@w3.org
On Jul 8, 2008, at 12:27 AM, Julian Reschke wrote: > Henrik Nordstrom wrote: >> On mån, 2008-07-07 at 18:56 -0400, Justin James wrote: >>> The problem with the concept of HTML specifying its own URLs, >>> from my >>> viewpoint, is that developers need one standard to follow, not 3 >>> (URI, >>> IRI, HTTP URL). >> But I am still not aware of the problem which triggered this. I >> linger >> on the HTTP WG, not the HTML one.. and is therefore unaware of what >> problem HTTP URL/URI/IRI specifications cause for HTML. >> ... > > See thread at <http://lists.w3.org/Archives/Public/uri/2008Jun/ > 0088.html>. > > Key issues: > > 1) there are non-IRI identifiers in HTML in use (such as using > space characters) No, there aren't. The contents of the attribute value is CDATA, not an IRI. How the parser converts the CDATA to a URI string (not IRI string) should be defined by HTML. The algorithm doesn't even need to be the same for different element attributes (e.g., some attribute values consist of space-separated references). The value doesn't become identifier(s) until after the conversion of CDATA string to valid URI is complete. > 2) UAs do not use UTF-8 consistently when mapping non-ASCII > characters in query parameters (they may use the document encoding > instead) That's because UTF-8 was not a desired mapping when HTML was defined. That's why HTML maps query parameters to the document encoding. I don't see why this is even being argued, since it certainly won't be changing any time soon. It makes far more sense to encourage the use of UTF-8 document encodings. > 3) there is no defined error handling in URI/IRI (I do not agree > that this is a problem with URI/IRI) Of course not, just as there is no defined error handling for the name on your birth certificate. Error handling is always defined by context. > 1) and 2) can be solved by defining a transformation from HTML URL > to IRI. HTML5 currently modifies the parsing rules of IRI instead, > which I think is the wrong approach. The whole discussion is just brain dead. All of the supposed issues are about translating raw data into standardized form. Instead of simply defining the transform of raw attribute to standardized value, which is entirely governed by HTML, the editor has chosen to treat the raw value as some sort of magic final form, reuses the well-known URL moniker is the most asinine way, and blames the other standards (which he thankfully has no control over) for not supporting all of the possible crappy raw data that could be input in an HTML attribute. We know that just anything is not interoperable. That's why URI is limited to a fairly small set of characters and a simple syntax: to require WWW identifiers to be in a form that is usable worldwide. That's why HTTP identifiers are limited to URIs. That's why this whole discussion about creating new identifiers and new protocols in HTML is a total waste of time -- the rest of the world does not want it and will not allow it to be published as HTML5. Pound the sand all you like; the network standards will not change because they are designed to support everyone's needs, not just the selfish desires of a very small set of browser developers. ....Roy
Received on Tuesday, 8 July 2008 22:13:16 UTC