W3C home > Mailing lists > Public > www-html@w3.org > June 2007

RE: file URL is overspecified

From: Michael Kay <mhk@mhk.me.uk>
Date: Fri, 15 Jun 2007 13:11:31 +0100
To: "'Kristof Zelechovski'" <giecrilj@stegny.2a.pl>, <www-html@w3.org>
Cc: "'Tim Berners-Lee'" <timbl@w3.org>, <xsl-editors@w3.org>, <whatwg@whatwg.org>
Message-ID: <00cf01c7af46$500031b0$6401a8c0@turtle>
>The reason is that the prohibition of B.2.1 propagated to the XSLT
specification that refers to it explicitly where it specifies how URI
attributes should be transformed in
<blocked::http://www.w3.org/TR/xslt#section-HTML-Output-Method> html mode.
In effect, a document produced by a conforming XSLT processor for local
usage is perfectly valid and perfectly useless: hyperlinks are broken and
images do not show up.
 
To help you get round the difference between what the HTML spec says and
what current browsers do, XSLT 2.0 introduced the serialization parameter
escape-uri-attributes="no", giving the XSLT author control over whether and
which URIs in generated HTML pages are percent-encoded. Of course, this is
only a small amelioration to this messy problem; but it helps.
 
Michael Kay
http://www.saxonica.com/


  _____  

From: xsl-editors-request@w3.org [mailto:xsl-editors-request@w3.org] On
Behalf Of Kristof Zelechovski
Sent: 15 June 2007 11:50
To: www-html@w3.org
Cc: 'Tim Berners-Lee'; xsl-editors@w3.org; whatwg@whatwg.org
Subject: file URL is overspecified



The URI reference <http://www.w3.org/TR/REC-html40/references.html>  in the
HTML 4 refers to RFC  <http://www.ietf.org/rfc/rfc2396.txt> 2396 which is
obsolete by RFC 3986 <http://www.ietf.org/rfc/rfc3986.txt> .  The latter
document has a new section 2.5: "Identifying Data", containing the following
new material:

URI characters provide identifying data for each of the URI components,
serving as an external interface for identification between systems.
Although the presence and nature of the URI production interface is hidden
from clients that use its URIs (and is thus beyond the scope of the
interoperability requirements defined by this specification), it is a
frequent source of confusion and errors in the interpretation of URI
character issues.  Implementers have to be aware that there are multiple
character encodings involved in the production and transmission of URIs:
local name and data encoding, public interface encoding, URI character
encoding, data format encoding, and protocol encoding.

Local names, such as file system names, are stored with a local character
encoding.  URI producing applications (e.g., origin servers) will typically
use the local encoding as the basis for producing meaningful names.  The URI
producer will transform the local encoding to one that is suitable for a
public interface and then transform the public interface encoding into the
restricted set of URI characters (reserved, unreserved, and
percent-encodings). Those characters are, in turn, encoded as octets to be
used as a reference within a data format (e.g., a document charset), and
such data formats are often subsequently encoded for transmission over
Internet protocols.

The new statements above are slightly incompatible with what HTML URI
<http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1> encoding
specification says: 

URIs do not contain non-ASCII values 

That statement is true for what the RFC calls "public interface encoding":
it seems reasonable that the user agent should use an URL when it requests
an external resource; however, requiring that HTML documents should use a
public URI for resources that the user agent is expected to serve without
communicating with an external server, such as local files identified using
then file scheme, seems an excessive complication to me.  Internet Explorer
does not respect <http://blogs.msdn.com/ie/atom.xml>  this prohibition
because it uses IRIs, not URIs, internally, and converts them to URLs if
needed when it communicates with an external server.  If an external URL is
specified in the source document as percent-encoded, it is passed without
altering because encoding is not needed and the server is responsible for
decoding; however, there is no server to decode a local URL and it remains
unresolved.  That is not compliant with the current standard, but I think in
this case the implementation is right and the standard needs some freedom
with respect to local URLs.

Of course, one could always do away with an argument that an HTML document
containing reference to a local resource cannot be published and can be
authored as noncompliant.  However, this is only partially true.  The reason
is that the prohibition of B.2.1 propagated to the XSLT specification that
refers to it explicitly where it specifies how URI attributes should be
transformed in html  <http://www.w3.org/TR/xslt#section-HTML-Output-Method>
mode.  In effect, a document produced by a conforming XSLT processor for
local usage is perfectly valid and perfectly useless: hyperlinks are broken
and images do not show up.

*        My suggestion: The constraints for URLs denoting local resources
should be relaxed.

I understand that this is fixed by HTML
<http://www.whatwg.org/specs/web-apps/current-work/multipage/section-documen
t.html> 5, so this is perhaps the good news:

The href content attribute, if specified, must contain a URI (or IRI).

Best regards,

Christopher Yeleighton
Received on Friday, 15 June 2007 12:16:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:16:10 GMT