W3C home > Mailing lists > Public > ietf-charsets@w3.org > April to June 2000

Registration of new charset SCSU

From: Markus Scherer <markus.scherer@jtcsv.com>
Date: Tue, 13 Jun 2000 11:19:43 -0700
To: ietf-charsets@iana.org
Message-id: <39467B3F.6471C81A@jtcsv.com>
(This is a proposal for a registration; I am using the template from draft-freed-charset-regist-02.txt)

Charset name: SCSU

Charset aliases: (none, except for the implicit csSCSU)

Suitability for use in MIME text: No

Published Specification: Unicode Technical Report #6
    "A Standard Compression Scheme for Unicode"
    Note: SCSU is a Character Encoding Scheme (CES) of
    Unicode and ISO 10646 (of the UTF-16-reachable subset).

ISO 10646 equivalency table: Same as specification.

Additional Information:
    SCSU is a Character Encoding Scheme for Unicode/ISO 10646
    that allows significant size reduction of text compared
    to UCS Transformation formats. It approximates the size of
    text that is otherwise achieved with language-specific
    charsets while encoding all of Unicode (up to U-0010ffff).
    SCSU is byte-based, which helps further, traditional
    compression (LZW etc.).
    It is stateful and uses all byte values including NUL.
    CRLF may or may not be represented by 0x0d 0x0a depending
    on the encoder and the text.
    Encoders can be trivial by emitting one command byte (0x0f)
    followed by the text in UTF-16BE. Fairly simple encoders
    yield good results with average text of any length.
    Decoding is simple and requires no mapping tables.
    If no control characters other than NUL, TAB, CR, and LF
    are used, then text in US-ASCII or ISO-8859-1 is already
    valid SCSU text.
    There is a Unicode signature byte sequence defined
    (0e fe ff, see specification).
    SCSU is a good charset for application/xml, for example.

Personal & email address to contact for further information:
    Markus W. Scherer
    IBM Java Technology Center
    10275 N. DeAnza Blvd
    Cupertino, CA 95014-2237


Intended usage: LIMITED USE

Thank you for your consideration,

Received on Tuesday, 13 June 2000 14:26:22 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:52:17 UTC