Re: revised "generic syntax" internet draft

Francois Yergeau (yergeau@alis.com)
Sun, 13 Apr 1997 23:54:47 -0400


Message-Id: <3.0.1.32.19970413235447.006e2e48@genstar.alis.com>
Date: Sun, 13 Apr 1997 23:54:47 -0400
To: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
From: Francois Yergeau <yergeau@alis.com>
Subject: Re: revised "generic syntax" internet draft 
Cc: uri@bunyip.com
In-Reply-To: <9704111452.aa29903@paris.ics.uci.edu>

À 14:52 11-04-97 -0700, Roy T. Fielding a écrit :
>The only question that matters is whether or not the draft as it
>currently exists is a valid representation of what the existing
>practice is

The current spec doesn't do that.  Non-ASCII characters are routinely
rolled into URLs, yet the spec doesn't define the mapping.  IMHO, the spec
is not worthy of becoming a Draft Standard, in fact it doesn't even meet
one the requirements for Proposed Standard (from RFC 2026):

   A Proposed Standard should have no known technical omissions
   with respect to the requirements placed upon it.

> and what the vendor community agrees is needed in the
>future to support interoperability.

I'm not aware that the Internet standards process excludes non-vendors.

>Since it is my opinion that it is NEVER desirable
>to show a URL in the unencoded form given in Francois' examples,
>you cannot claim to hold anything even remotely like consensus. 

A bit preposterous, isn't it?  *Your* opinion alone is enough to break any
consensus?

I also happen to disagree with this particular opinion.  ASCII characters
are not the only ones worth displaying.  User-friendliness should not be
the exclusive apanage of ASCII users.

>IF you can persuade the creators of URLs to always use UTF-8, which
>is definitely not the case today (Apache, NCSA, and CERN servers all
>use whatever charset is used by the underlying filesystem, which on
>most Unix-based systems is iso-8859-1 or iso-2022-*), ...

It is interesting that you should use this argument.  Yes, Apache, NCSA and
CERN all use the platform's charset for mapping filenames to URLs (which
can be remedied by a simple script, BTW).

But these three also transmit documents in the charset that is found in the
document (transparency, no transcoding), yet you claimed loudly in the HTTP
WG that they somehow defaulted to ISO 8859-1, and insisted strongly that
this fictitious default charset remain in the HTTP/1.1 spec.

In both cases the major servers behave transparently w/r to character
encoding, in one case to filenames, in the other to document contents.  But
we have two different conclusions: the servers do not support UTF-8 URLs,
but they somehow manage to uphold the official ISO 8859-1 default document
charset. Go figure!


-- 
François Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montréal
Tél : +1 (514) 747-2547
Fax : +1 (514) 747-2561