- From: John Cowan <jcowan@reutershealth.com>
- Date: Thu, 21 Oct 2004 00:29:42 -0400
- To: Richard Tobin <richard@inf.ed.ac.uk>
- Cc: François Yergeau <francois@yergeau.com>, public-xml-core-wg@w3.org
Richard Tobin scripsit: > What is the significance of "most" here? If you know the encoding is > an ASCII superset, you can recognize all ASCII characters. Are there > encodings in use that are not strict ASCII supersets which nonetheless > use the same encoding as ASCII for '<' and '?'? Probably not. But recognizing a text as EBCDIC family is not sufficient to nail down the entire ASCII repertoire, only the 83 characters of the EBCDIC invariant repertoire (listed here in EBCDIC codepoint order): SP, ., <, (, +, &, *, ), ;, -, /, comma, %, _, >, ?, :, ', =, ", a-z, A-Z, 0-9. The 13 ASCII characters !, #, $, @, [, \, ], ^, `, {, |, }, ~ can only be recognized once the exact EBCDIC code page is known. (" and a-z are not technically invariant, but almost all EBCDIC code pages have them in the standard places.) Fortunately, none of them are allowed in the XML declaration. (Apologies to Richard, who accidentally got an earlier version of this.) -- We call nothing profound jcowan@reutershealth.com that is not wittily expressed. John Cowan --Northrop Frye (improved) http://www.reutershealth.com
Received on Thursday, 21 October 2004 04:30:30 UTC