W3C home > Mailing lists > Public > ietf-charsets@w3.org > July to September 2000

Fwd: Registration of new charset SCSU

From: Harald Tveit Alvestrand <Harald@Alvestrand.no>
Date: Wed, 19 Jul 2000 11:36:14 +0200
To: iana@iana.org
Cc: ietf-charsets@iana.org
Message-id: <>
I approve of this registration, and apologize for being 2 weeks late with
announcing that decision. (I've been on holiday!)

                       Harald Tveit Alvestrand
                       IETF charset reviewer

>Date: Thu, 15 Jun 2000 08:41:26 -0700
>From: Markus Scherer <markus.scherer@jtcsv.com>
>Subject: Registration of new charset SCSU
>To: charsets <ietf-charsets@iana.org>
>Organization: IBM
>X-Mailer: Mozilla 4.51 [en] (WinNT; U)
>X-Accept-Language: en,de,eo
>(This is an updated proposal that incorporates feedback from
>Kenneth Whistler, Harald Tveit Alvestrand, and Martin J. Dürst.)
>Charset name: SCSU
>Charset aliases: (none, except for the implicit csSCSU)
>Suitability for use in MIME text: No
>Published Specification:
>     Unicode Technical Report #6
>     "A Standard Compression Scheme for Unicode"
>     http://www.unicode.org/unicode/reports/tr6/
>     CCS & CES: The SCSU charset is a combination of the
>     Unicode and ISO 10646 Coded Character Set (CCS) with
>     the Character Encoding Scheme (CES) specified in
>     the above document. It covers exactly the
>     UTF-16-reachable subset of ISO 10646.
>     SCSU can also be classified as a Transfer Encoding
>     Syntax (TES), but one specifically designed for
>     Unicode/ISO 10646.
>ISO 10646 equivalency table: Same as specification.
>Additional Information:
>     SCSU is an encoding (CES/TES) of Unicode/ISO 10646
>     that allows significant size reduction of text compared
>     to UCS Transformation formats. It approximates the size of
>     text that is otherwise achieved with language-specific
>     charsets while encoding all of Unicode (up to U-0010ffff).
>     SCSU is byte-based, which helps further, traditional
>     compression (LZW etc.).
>     It is stateful and uses all byte values including NUL.
>     CRLF may or may not be represented by 0x0d 0x0a depending
>     on the encoder and the text.
>     Encoders can be trivial by emitting one command byte (0x0f)
>     followed by the text in UTF-16BE. Fairly simple encoders
>     yield good results with average text of any length.
>     Decoding is simple and requires no mapping tables.
>     If no control characters other than NUL, TAB, CR, and LF
>     are used, then text in US-ASCII or ISO-8859-1 is already
>     valid SCSU text.
>     There is a Unicode signature byte sequence defined
>     (0e fe ff, see specification).
>     SCSU is of no use for applications that require a canonical
>     representation of text. It is not suitable for
>     process-internal use.
>     This is an intentional part of its design.
>Personal & email address to contact for further information:
>     Markus W. Scherer
>     IBM Java Technology Center
>     10275 N. DeAnza Blvd
>     Cupertino, CA 95014-2237
>     markus.scherer@jtcsv.com
>     schererm@us.ibm.com
>Intended usage: LIMITED USE
>Thank you for your consideration,

Harald Tveit Alvestrand
Until August 1: EDB Maxware, Trondheim, Norway
After August 1: Cisco Systems, still living in Trondheim
Always: Harald@Alvestrand.no
Received on Wednesday, 19 July 2000 08:14:09 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:52:17 UTC