Re: PE134 from John Cowan on 2004-10-21 (public-xml-core-wg@w3.org from October 2004)

From: John Cowan <jcowan@reutershealth.com>
Date: Thu, 21 Oct 2004 00:29:42 -0400
To: Richard Tobin <richard@inf.ed.ac.uk>
Cc: François Yergeau <francois@yergeau.com>, public-xml-core-wg@w3.org
Message-ID: <20041021042942.GO12809@skunk.reutershealth.com>

Richard Tobin scripsit:

> What is the significance of "most" here?  If you know the encoding is
> an ASCII superset, you can recognize all ASCII characters.  Are there
> encodings in use that are not strict ASCII supersets which nonetheless
> use the same encoding as ASCII for '<' and '?'?

Probably not.  But recognizing a text as EBCDIC family is not sufficient
to nail down the entire ASCII repertoire, only the 83 characters of the
EBCDIC invariant repertoire (listed here in EBCDIC codepoint order):
SP, ., <, (, +, &, *, ), ;, -, /, comma, %, _, >, ?, :, ', =, ", a-z, A-Z, 0-9.
The 13 ASCII characters !, #, $, @, [, \, ], ^, `, {, |, }, ~ can only
be recognized once the exact EBCDIC code page is known.  (" and a-z are
not technically invariant, but almost all EBCDIC code pages have them
in the standard places.)  Fortunately, none of them are allowed in the
XML declaration.

(Apologies to Richard, who accidentally got an earlier version of this.)

-- 
We call nothing profound                        jcowan@reutershealth.com
that is not wittily expressed.                  John Cowan
        --Northrop Frye (improved)              http://www.reutershealth.com

Received on Thursday, 21 October 2004 04:30:30 UTC