Re: revised "generic syntax" internet draft

Martin J. Duerst (mduerst@ifi.unizh.ch)
Sat, 19 Apr 1997 18:01:56 +0200 (MET DST)


Date: Sat, 19 Apr 1997 18:01:56 +0200 (MET DST)
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Chris Newman <Chris.Newman@innosoft.com>
Cc: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>,
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <Pine.SOL.3.95.970418135341.9117E-100000@eleanor.innosoft.com>
Message-Id: <Pine.SUN.3.96.970419175423.708Y-100000@enoshima>

On Fri, 18 Apr 1997, Chris Newman wrote:

> That problem statement is a bit verbose, but accurate.

Sorry. Because I am a fast typer (DVORAK keyboard, you know),
I tend to be verbose.


> On Fri, 18 Apr 1997, Roy T. Fielding wrote:

> > I think there is a way to define UTF-8 preference for URL encoding
> > such that it won't break existing services, by forbidding transcoding
> > of already-encoded octets.  However, I won't bother to explain that
> > until there is broad agreement on what needs to be solved.
> 
> Yes, if you forbid transcoding of %80-%FF, and that representation were
> actually used in the filesystem, then the charset (or lack thereof) in the
> filesystem isn't a problem.

Transcoding %80-%FF, i.e. suddenly changing %80 into %83 (or whatever)
for whatever reasons, is definitely not part of the plan. Whenever
we see something like %HH, we know that we have to take it as an
encoded octet. What some application might do, for the user's
convenience, is to convert it into actual characters. But in order
for this to work, we have to agree on a single (or at least a
preferential) character->octet encoding.

Real characters, on the other hand, transported in some documents,
will always be transcoded with the document as a whole (e.g.
from EUC to JIS for mail in Japan) but they keep their character
identity. The same applies to "%", "8", "0",... if we take
into account EBCDIC.

Regards,	Martin.