W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > October 2004

Re: PE134

From: François Yergeau <francois@yergeau.com>
Date: Wed, 20 Oct 2004 18:30:38 -0400
To: Richard Tobin <richard@inf.ed.ac.uk>
Cc: public-xml-core-wg@w3.org
Message-id: <4176E70E.5020703@yergeau.com>

Richard Tobin a écrit :
>>It turns out that knowing the encoding family is 
>>sufficient to reliably recognize U+0020 SPACE as well as most ASCII 
>>characters
> 
> What is the significance of "most" here?  If you know the encoding is
> an ASCII superset, you can recognize all ASCII characters.

But if you detect an EBCDIC-family encoding, you know the positions of 
only "most" ASCII characters, since the common subset of EBCDIC code 
pages is not exactly the same as ASCII.  That subset, however, is 
sufficient to analyse the XML declaration, find the encoding decl. 
within and learn the precise EBCDIC page you have.

-- 
François
Received on Wednesday, 20 October 2004 22:31:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:31 GMT