- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Mon, 21 May 2007 22:47:56 +0200
- To: www-validator@w3.org
Dana C. Chandler III wrote: > Is there a definitive list of Character sets that have ASCII > as a subset? If you find one please post its URL. You could construct it by installing ICU, and then check where all ASCII characters are mapped to the same ASCII characters, and nothing else is mapped to ASCII characters. It also depends on your definition, UTF-16 and UTF-32 don't have ASCII as subset if you talk about octets (8bits). UTF-7 and UTF-1 also don't qualify. You'd have to watch all these charsets with code-switching (SCSU etc.), if they have states where an ASCII octet doesn't stand for the ASCII character. Some IBM codepages rotate SUB - DEL - FS, arguably that's not more ASCII. IIRC an Adobe charset also had an oddity in the range 0x00 up to 0x7F. The simple cases are UTF-8, Latin-1 (plus some other Latin-*), windows-1252 (plus some other windows-*), codepage 437, 850, 858 (plus a few others, ignoring the IBM rotation), and likely some Mac charsets (not registered, better ignore unregistered charsets, they're hopeless, moving targets). After that it starts to get interesting... Frank
Received on Monday, 21 May 2007 20:50:50 UTC