Re: revised "generic syntax" and "data:" internet drafts from Martin J. Duerst on 1997-04-02 (uri@w3.org from April 1997)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Wed, 2 Apr 1997 16:23:59 +0200 (MET DST)
To: Larry Masinter <masinter@parc.xerox.com>
Cc: uri@bunyip.com
Message-Id: <Pine.SUN.3.96.970402152823.245G-100000@enoshima>
On Tue, 1 Apr 1997, Larry Masinter wrote:

> I haven't seen announcements on these drafts, but there's
> a revised "generic syntax" draft:
> 
>   ftp://ftp.ietf.org/internet-drafts/draft-fielding-url-syntax-04.txt
> 
> with those changes that seemed uncontroversial, and a revision

[I was away on a trip for two weeks, and I am still not through
all of my email, so I might have missed something.]

I downloaded draft-fielding-url-syntax-04.txt and had a look at it.
A lot of things have settled down, and the draft as a whole is now
in extremely good shape.

However, I am very astonished to find one important thing missing,
on which I had assumed that rough consensus was achieved and for
which an excellent proposal for text (by Roy Fielding) exists.

The issue is (you might guess it :-) the internationalization of
URLs. The place where one would expect the text by Roy Fielding
to go is the end of Section 2.1., but the only text it contains is:

   In current practice, many different character encoding schemes are
   used in the first mapping (between sequences of represented
   characters and sequences of octets) and there is generally no
   representation in the URL itself of which mapping was used.  For
   this reason, a client without knowledge of the origination
   mechanism cannot reliably unescape characters for display.

This is a clear confession of the hopeless deficiencies of the current
solution. Many people have pointed out that a better solution exists,
and the there is an easy upgrade path to this solution. It was also
explained in many details that adopting such a solution would bring
substantial benefits to those affected by the current non-solution,
while nobody preferring the current chaos would be required to
change anything.

Therefore, I herewith propose that the following two paragraphs
(text by Roy Fielding) be added to the end of Section 2.1:

   URL creation mechanisms that generate the URL from a source which
   is not restricted to a single character->octet encoding are
   encouraged, but not required, to transition resource names toward
   using UTF-8 exclusively.

   URL creation mechanisms that generate the URL from a source which 
   is restricted to a single character->octet encoding should use UTF-8
   exclusively.  If the source encoding is not UTF-8, then a mapping
   between the source encoding and UTF-8 should be used.

For the detailed arguments for this proposal, I refer you to
http://www.ifi.unizh.ch/mml/mduerst/urli18n.html.


On another subject: The draft proposes
	*hex *["." *hex] ".ipv6"
for ipv6 addresses. This means that ".ipv6" becomes a syntactic
top-level domain that never actually exists in DNS, which is probably
a first, and could interfer with the IAHC process and other things.
An alternative would be to use the well-defined inverted address
of ipv6 (I think this is *hex *["." *hex] ".ipv6.int" or something
similar) which is used for looking up domain names from internet
addresses with PTR records. The advantage of this solution is that
no new top-level domain name has to be constructed. The disadvantage
is that the *hex components will have to be listed from right to
left instead of from left to right. However, the use of internet
addresses consisting of 32 hex digits will probably be much rarer
than the use of internet addresses using up to 12 digits, so that
this might not be such a big problem.


>   ftp://ftp.ietf.org/internet-drafts/draft-masinter-url-data-03.txt
> 
> both of which I believe are ready for "last call".

Well, if the above text regarding UTF-8 is added to the syntax draft,
it should indeed be ready for last call. 


Regards,	Martin.
Received on Wednesday, 2 April 1997 09:25:31 UTC