- From: Kristof Zelechovski <giecrilj@stegny.2a.pl>
- Date: Fri, 15 Jun 2007 12:50:14 +0200
The URI reference <http://www.w3.org/TR/REC-html40/references.html> in the HTML 4 refers to RFC 2396 <http://www.ietf.org/rfc/rfc2396.txt> which is obsolete by RFC 3986 <http://www.ietf.org/rfc/rfc3986.txt> . The latter document has a new section 2.5: "Identifying Data", containing the following new material: URI characters provide identifying data for each of the URI components, serving as an external interface for identification between systems. Although the presence and nature of the URI production interface is hidden from clients that use its URIs (and is thus beyond the scope of the interoperability requirements defined by this specification), it is a frequent source of confusion and errors in the interpretation of URI character issues. Implementers have to be aware that there are multiple character encodings involved in the production and transmission of URIs: local name and data encoding, public interface encoding, URI character encoding, data format encoding, and protocol encoding. Local names, such as file system names, are stored with a local character encoding. URI producing applications (e.g., origin servers) will typically use the local encoding as the basis for producing meaningful names. The URI producer will transform the local encoding to one that is suitable for a public interface and then transform the public interface encoding into the restricted set of URI characters (reserved, unreserved, and percent-encodings). Those characters are, in turn, encoded as octets to be used as a reference within a data format (e.g., a document charset), and such data formats are often subsequently encoded for transmission over Internet protocols. The new statements above are slightly incompatible with what HTML <http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1> URI encoding specification says: URIs do not contain non-ASCII values That statement is true for what the RFC calls "public interface encoding": it seems reasonable that the user agent should use an URL when it requests an external resource; however, requiring that HTML documents should use a public URI for resources that the user agent is expected to serve without communicating with an external server, such as local files identified using then file scheme, seems an excessive complication to me. Internet Explorer does <http://blogs.msdn.com/ie/atom.xml> not respect this prohibition because it uses IRIs, not URIs, internally, and converts them to URLs if needed when it communicates with an external server. If an external URL is specified in the source document as percent-encoded, it is passed without altering because encoding is not needed and the server is responsible for decoding; however, there is no server to decode a local URL and it remains unresolved. That is not compliant with the current standard, but I think in this case the implementation is right and the standard needs some freedom with respect to local URLs. Of course, one could always do away with an argument that an HTML document containing reference to a local resource cannot be published and can be authored as noncompliant. However, this is only partially true. The reason is that the prohibition of B.2.1 propagated to the XSLT specification that refers to it explicitly where it specifies how URI attributes should be transformed in html mode <http://www.w3.org/TR/xslt#section-HTML-Output-Method> . In effect, a document produced by a conforming XSLT processor for local usage is perfectly valid and perfectly useless: hyperlinks are broken and images do not show up. * My suggestion: The constraints for URLs denoting local resources should be relaxed. I understand that this is fixed by HTML <http://www.whatwg.org/specs/web-apps/current-work/multipage/section-documen t.html> 5, so this is perhaps the good news: The href content attribute, if specified, must contain a URI (or IRI). Best regards, Christopher Yeleighton -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20070615/d465cef2/attachment.htm>
Received on Friday, 15 June 2007 03:50:14 UTC