Date: Sat, 19 Apr 1997 18:01:56 +0200 (MET DST) From: "Martin J. Duerst" <firstname.lastname@example.org> To: Chris Newman <Chris.Newman@innosoft.com> Cc: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>, Subject: Re: revised "generic syntax" internet draft In-Reply-To: <Pine.SOL.3.95.970418135341.9117Eemail@example.com> Message-Id: <Pine.SUN.3.96.970419175423.708Y-100000@enoshima> On Fri, 18 Apr 1997, Chris Newman wrote: > That problem statement is a bit verbose, but accurate. Sorry. Because I am a fast typer (DVORAK keyboard, you know), I tend to be verbose. > On Fri, 18 Apr 1997, Roy T. Fielding wrote: > > I think there is a way to define UTF-8 preference for URL encoding > > such that it won't break existing services, by forbidding transcoding > > of already-encoded octets. However, I won't bother to explain that > > until there is broad agreement on what needs to be solved. > > Yes, if you forbid transcoding of %80-%FF, and that representation were > actually used in the filesystem, then the charset (or lack thereof) in the > filesystem isn't a problem. Transcoding %80-%FF, i.e. suddenly changing %80 into %83 (or whatever) for whatever reasons, is definitely not part of the plan. Whenever we see something like %HH, we know that we have to take it as an encoded octet. What some application might do, for the user's convenience, is to convert it into actual characters. But in order for this to work, we have to agree on a single (or at least a preferential) character->octet encoding. Real characters, on the other hand, transported in some documents, will always be transcoded with the document as a whole (e.g. from EUC to JIS for mail in Japan) but they keep their character identity. The same applies to "%", "8", "0",... if we take into account EBCDIC. Regards, Martin.