W3C home > Mailing lists > Public > ietf-charsets@w3.org > July to September 2001

RE: Registration of a new charset

From: <ned.freed@mrochek.com>
Date: Wed, 26 Sep 2001 13:40:34 -0700 (PDT)
To: Harald Alvestrand <harald@alvestrand.no>
Cc: Simon Tardell <simon.tardell@smarttrust.com>, "'ietf-charsets@iana.org'" <ietf-charsets@iana.org>
Message-id: <01K8SMUE6KXQ000NBA@mauve.mrochek.com>
> I'm the reviewer...

> 2 pieces of information I like to have in a registration:

> - Suitability for MIME text encoding: (Yes/No)
>   (I think yes - it has CR and LF in the obvious places)

Yes it does have these in the right place, but that's not sufficient for MIME.
The other requirement is that NULL not be used. Unfortunately, in GSM NULL is
where the at-sign character lives.

> - Whether a mapping to Unicode exists, and if so, where.
>   (is that character at 0x09 a C-cedilla or a C with-hook?

Another question with this particular character is whether it is an upper or
lower case C. It looks like an upper case C to me in the chart, but I question
the wisdom of having an upper case C-cedilla but no lower case C-cedilla.

>    Upper or lower case? details are out to get you...but "no" is a fine
>    answer...)

> A small detail is that the GSM default charset (yes, I read the reference -
> ETSI has a sensible distribution policy) is a 7-bit character set; it is
> "obvious but worth stating" that when this character set is used, the
> character is carried in an 8-bit byte, with the character in the lower 7
> bits, and the 8th bit is zero (unlike SMS, which jams 8 characters in 7
> bytes).

Agreed, although there is apparently similar 7bit squeezing going on in
other contexts. Go figure.


				Ned
Received on Wednesday, 26 September 2001 16:57:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:52 GMT