W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > October 2004

(unknown

From: (unknown <jcowan@reutershealth.com>
Date: Thu, 21 Oct 2004 00:29:42 -0400
To: (unknown charset) Richard Tobin <ric
Cc: (unknown charset) Fran├žois Yergeau <francois@yergeau.com>, public-
Message-ID: <20041021042942.GO12809@skunk.reutershealth.com>

Richard Tobin scripsit:

> What is the significance of "most" here?  If you know the encoding is
> an ASCII superset, you can recognize all ASCII characters.  Are there
> encodings in use that are not strict ASCII supersets which nonetheless
> use the same encoding as ASCII for '<' and '?'?

Probably not.  But recognizing a text as EBCDIC family is not sufficient
to nail down the entire ASCII repertoire, only the 83 characters of the
EBCDIC invariant repertoire (listed here in EBCDIC codepoint order):
SP, ., <, (, +, &, *, ), ;, -, /, comma, %, _, >, ?, :, ', =, ", a-z, A-Z, 0-9.
The 13 ASCII characters !, #, $, @, [, \, ], ^, `, {, |, }, ~ can only
be recognized once the exact EBCDIC code page is known.  (" and a-z are
not technically invariant, but almost all EBCDIC code pages have them
in the standard places.)  Fortunately, none of them are allowed in the
XML declaration.

(Apologies to Richard, who accidentally got an earlier version of this.)

-- 
We call nothing profound                        jcowan@reutershealth.com
that is not wittily expressed.                  John Cowan
        --Northrop Frye (improved)              http://www.reutershealth.com
Received on Thursday, 21 October 2004 04:30:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:31 GMT