W3C home > Mailing lists > Public > public-html@w3.org > July 2008

Re: URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea

From: Julian Reschke <julian.reschke@gmx.de>
Date: Tue, 08 Jul 2008 10:03:12 +0200
Message-ID: <48731F40.3000409@gmx.de>
To: Martin Duerst <duerst@it.aoyama.ac.jp>
CC: Justin James <j_james@mindspring.com>, 'Ian Hickson' <ian@hixie.ch>, 'Sam Ruby' <rubys@us.ibm.com>, 'HTTP Working Group' <ietf-http-wg@w3.org>, public-html@w3.org

Martin Duerst wrote:
> ...
> It may or may not need such a special case. The truth is that some years
> ago (less than 10), virtually all existing non-ASCII path information
> in (U/I)RIs had to be interpreted in the encoding of the containing page.
> This has changed, because people started to pick up on the idea of IRIs,
> more and more systems used UTF-8 on the server side, and at least some
> people understood that using the encoding of the containing page
> made it impossible to treat such identifiers free-standing. Also, a
> fallback for paths in legacy encodings is still availible (and was always
> available): %-encoding.
> 
> As long as query URIs are interpreted based on the encoding of the
> containing page, they will stay useless without that context. I.e.
> they cannot (without further pain) be put into bookmark lists, they
> cannot be sent in email, and so on. The only sensible way to make
> this possible is to do the same as for the path part, namely use
> UTF-8 for the IRI->URI conversion. Freestanding (U/I)RIs with
> query parts may be less important than freestanding (U/I)RIs
> without query parts, but still, they are often convenient.
> However, they won't work if implemented the way HTML5 is currently
> describing them. Also, same as for path parts, a fallback for query
> parts in legacy encodings is still availible (and was always
> available): %-encoding.
> 
> In summary, there are cases where things changed to the better
> in the last few years, and there are cases where some solutions
> make the Web work better than others.
> ...

Note that HTML5 documents that carry aren't encoded in UTF-8 (or UTF-16)
and which carry non-ASCII query parameters are currently non-conformant.
(I personally don't think it makes a big difference in practice as HTML5
makes normatively defines their handling, so people will rely on that
anyway).

>> That can be done in a separate spec, defining a mapping from "HTTP URL" to IRI reference, and then letting the default URI/IRI rules apply.
> 
> I'm very much confused by "HTTP URL". In case that's the term that HTML5
> currently uses, it should use a different one, to avoid confusion.

Actually, I wanted to say "HTML URL" (URL as used in HTML5). HTML5
really uses just the term "URL".

BR, Julian
Received on Tuesday, 8 July 2008 08:04:02 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:56 UTC