- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Mon, 2 May 2011 16:33:00 -0700
- To: Maciej Stachowiak <mjs@apple.com>
- Cc: Adam Barth <ietf@adambarth.com>, public-iri@w3.org
On Apr 28, 2011, at 12:29 AM, Maciej Stachowiak wrote: > On Apr 27, 2011, at 10:12 PM, Roy T. Fielding wrote: > >> As you well know, what HTML5 needs is a definition for parsing >> arbitrary attribute values in document encoding. Those attribute >> values are not URLs. They aren't even URI references. They are >> one or more space-separated or space-ignoring strings in an HTML >> attribute encoding, and each reference needs to be extracted and >> transcoded before the definitions in 3986 are even applicable. > > It is fine to call the types of resource identifiers that appear in HTML and other parts of the Web platform (CSS, XHR, SVG, etc) something other than "URL" or "URI". The name does not really matter for interoperability. Technically, yes, but from a social perspective we have seen a lot of confusion caused by descriptions of URL, URI, or anyURI being the documented value range of attributes or data entry dialogs. That was actually true in some distant past, but browsers stopped rejecting invalid references a long time ago, for good reasons, and usually do some form of pct-encoding or truncation instead. We should therefore stop referring to the input as a uniform identifier when it is, in fact, an arbitrary string. We can then unambiguously refer to the output of the parsing, transcoding, and recombination algorithm as a URI or URL as defined by RFC3986. That is why I use the term reference for the value as found in the attribute/dialog. >> However, for the subset of possible references that do happen >> to match what are called valid URI references by RFC3986, then >> we have already tested consensus and deployed many implementations >> that conform exactly to the results given in RFC3986. > > If these references are something other than URIs, and must be transcoded, why is it important that the subset that happens to look syntactically like a valid URI must be processed without that transcoding step? This implies that the transcoding must be the identity encoding in some cases. Where does that assumption come from? Authors have been using plain old ASCII references to URIs for longer than the Web has been documented. We expect them to still work. Likewise for references that are in the document encoding but only use the subset of characters that are found in ASCII. URIs are defined in terms of characters, not octets, so the transcoding I am referring to is the removal of whitespace, pct-encoding of non-unreserved characters, etc. A reference that is already in URI form does not need to be transcoded. ....Roy
Received on Monday, 2 May 2011 23:33:25 UTC