Re: URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea from Martin Duerst on 2008-07-08 (ietf-http-wg@w3.org from July to September 2008)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Tue, 08 Jul 2008 15:13:01 +0900
To: Julian Reschke <julian.reschke@gmx.de>, Justin James <j_james@mindspring.com>
Cc: "'Ian Hickson'" <ian@hixie.ch>, "'Sam Ruby'" <rubys@us.ibm.com>, "'HTTP Working Group'" <ietf-http-wg@w3.org>, public-html@w3.org
Message-Id: <6.0.0.20.2.20080708145811.057d5d30@localhost>

At 23:05 08/07/07, Julian Reschke wrote:
>
>Justin James wrote:
>>> There is no "URI group" -- there's a list of people subscribed to the URI mailing list. That being said, I haven't seen *any* kind of consensus that RFC3986 should be changed. I've seen some discussion about whether RFC3987bis should expand on the "LEIRI" topic, and it seems Martin D��st was considering that input.
>> It seems to me that the following facts are true:
>> * The URI group/mailing list is not actively working to update or change the
>> URI specs.
>
>There is no URI working group. URI is a stable specification (full IETF standard), and there's no consensus that anything needs to be done with it with respect to "HTML URL".
>
>There are individuals (?) working on a revision of the IRI spec, including Martin D��st. That revision may contain more information about what's currently called LEIRI (Legacy Extended IRI), but I don't think there's consensus about whether this is really good idea.

I think that there is a consensus that LEIRIs are a bad idea.
The current(ly expired) draft actually says so. What there is
no consensus on is whether nevertheless, LEIRIs should be
described in the (future) IRI spec.

>Head over to the URI mailing list and discuss it, if you're interested.
>
>> * Over the last few weeks, it has become clear that the URI specs need to
>> change for certain aspects of browser behavior and HTML to make sense and/or
>> work right.
>
>Nope.

Agreed. "for browser behavior to make sense" is an attempt to justify
such browser behavior from an a-priori (good vs. bad) standpoint.
The current browser behavior, overall, makes sense, but there are
some details where it doesn't make sense. It would be better to write
"for some details of current browser behavior to fit some spec"

>What has become clear is that HTML needs to handle a superset of what IRI allows, and also needs to special case IRI->URI conversion for query components.

It may or may not need such a special case. The truth is that some years
ago (less than 10), virtually all existing non-ASCII path information
in (U/I)RIs had to be interpreted in the encoding of the containing page.
This has changed, because people started to pick up on the idea of IRIs,
more and more systems used UTF-8 on the server side, and at least some
people understood that using the encoding of the containing page
made it impossible to treat such identifiers free-standing. Also, a
fallback for paths in legacy encodings is still availible (and was always
available): %-encoding.

As long as query URIs are interpreted based on the encoding of the
containing page, they will stay useless without that context. I.e.
they cannot (without further pain) be put into bookmark lists, they
cannot be sent in email, and so on. The only sensible way to make
this possible is to do the same as for the path part, namely use
UTF-8 for the IRI->URI conversion. Freestanding (U/I)RIs with
query parts may be less important than freestanding (U/I)RIs
without query parts, but still, they are often convenient.
However, they won't work if implemented the way HTML5 is currently
describing them. Also, same as for path parts, a fallback for query
parts in legacy encodings is still availible (and was always
available): %-encoding.

In summary, there are cases where things changed to the better
in the last few years, and there are cases where some solutions
make the Web work better than others.

>That can be done in a separate spec, defining a mapping from "HTTP URL" to IRI reference, and then letting the default URI/IRI rules apply.

I'm very much confused by "HTTP URL". In case that's the term that HTML5
currently uses, it should use a different one, to avoid confusion.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Tuesday, 8 July 2008 07:56:33 UTC