W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > June 2009

[Bug 6858] More details needed for "ASCII-compatible encoding"

From: <bugzilla@wiggum.w3.org>
Date: Mon, 29 Jun 2009 07:19:59 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1MLB9j-0004hm-Tg@wiggum.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6858





--- Comment #4 from Martin Dürst <duerst@it.aoyama.ac.jp>  2009-06-29 07:19:59 ---
Looking at
http://dev.w3.org/html5/spec/Overview.html#ascii-compatible-character-encoding:

This solves the problem, but is needlessly complex. Instead of

An ASCII-compatible character encoding is a single-byte or variable-length
encoding in which the bytes 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27,
0x2C - 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A, ignoring bytes that are the second
and later bytes of multibyte sequences, all correspond to single-byte sequences
that map to the same Unicode characters as those bytes in ANSI_X3.4-1968
(US-ASCII).

the following would say the same but would be simpler:

An ASCII-compatible character encoding is a character encoding in which the
Unicode characters that have bytes values 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22,
0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A in ANSI_X3.4-1968
(US-ASCII, [RFC1345]) are represented by exactly and only the same byte values.

The note after that is also a good start, but also needs some more work.
Shift_JIS is used on every Japanese PC and Mac, so I wouldn't call this an
exotic encoding. On the other hand, I didn't find a *submitted* draft for
UTF-8+names, so whathever you think about it, it's clearly a dead end at this
point of time. So I would reword:

Note: This includes such exotic encodings as Shift_JIS and variants of
ISO-2022, even though it is possible for bytes like 0x70 to be part of longer
sequences that are unrelated to their interpretation as ASCII. It excludes such
encodings as UTF-7, UTF-8+names, UTF-16, HZ-GB-2312, GSM03.38, and EBCDIC
variants.

to something like:

Note: This includes encodings such as Shift_JIS and variants of ISO-2022, where
it is possible for bytes like 0x70 to appear as part of multibyte sequences
that are unrelated to their interpretation as ASCII. It excludes encodings such
as UTF-7, UTF-16, HZ-GB-2312, GSM03.38, and EBCDIC variants.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 29 June 2009 07:20:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 29 June 2009 07:20:12 GMT