Re: Fwd: I-D ACTION:draft-goldsmith-utf7-01.txt from Martin J. Duerst on 1997-02-07 (ietf-charsets@w3.org from January to March 1997)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Fri, 07 Feb 1997 14:29:40 +0100 (MET)
To: Dan Oscarsson <Dan.Oscarsson@trab.se>
Cc: unicore@Unicode.ORG, ietf-charsets@INNOSOFT.COM, David Goldsmith <goldsmith@apple.com>
Message-id: <Pine.SUN.3.95q.970207141159.245B-100000@enoshima>

On Fri, 7 Feb 1997 Dan Oscarson wrote:

> > > But even if is is restricted to UCS is would work fine to use:
> > > 
> > > Content-Type: text/plain; charset=UTF-7
> > > Content-Transfer-Encoding: 8bit
> > > 
> > > 
> > > and only encode characters that can not be represented by 8 bits.
> > 
> > Would work fine, eh? Who's going to figure out what the 8-bit
> > characters are, and how? 
> If it is UCS, the 8-bit characters are in UCS, of course! (i.e. iso 8859-1)

Dan - You came up with the idea to call ISO-8859-1 UCS-1.
Please note that this is purely your idea, and that there
is no standard that uses such a terminology or forsees such
usage. And people with a wide global perspective don't think
about iso-8859-1 as something that could be called "of course".

In the same vein, we could call ASCII UCS-1 (well,
exactly it would have to be UCS-0.875 :-), but that is
definitely not intended. Also note that UCS stands for
Universal Character Set. Apart from the question of whether
we are the only writing species in the universe, UCS-2
(the BMP) is pretty much universal. ISO-8859-1 definitely
does not deserve to be called universal.

> > And then also something like
> > 	Content-Type: text/plain; charset=iso-2022-jp
> > 	Content-Transfer-Encoding: 8bit
> > would have to mean something (because iso-2022-jp is a pure
> > 7-bit encoding). Very strange indeed!
> > 
> It is prefectely ok to use the above, even though there are no
> 8-bit characters. You are allowed to specify
> Content-Transfer-Encoding: 8bit
> even if no 8-bit codes are used.

If there are really no 8-bit characters, then that's not a
major problem. It's not what MIME suggests to do, but it is
acceptable. It is also acceptable for "charset=UTF-7".
What is not acceptable is to suddenly try to fill in stuff
into character encodings ("charset"s) that are purely
7-bit, as you have proposed above. If somebody things that
we need a UCS-7 form that is compatible (more or less) with
iso-8859-1, then with equal legitimation, there are many other
legacy encodings that could use such a combination. But as
UTF-7 already encodes all of UCS-2, and the relevant portions
of UCS-4, there is no need for such strange combinations.

Regards,	Martin.

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Friday, 7 February 1997 05:32:17 UTC