- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Fri, 28 Mar 2003 17:01:53 +1100
- To: <xml-editor@w3.org>
From: "C. M. Sperberg-McQueen" <cmsmcq@acm.org> > There is necessarily guesswork involved -- not on the part of the > processor following the algorithm described, but certainly on the > part of the XML community, in taking as a premise the proposition > that in practice, the only character encodings with which an XML processor > will ever be confronted are those which the algorithm successfully > identifies. > > There is no logical necessity for coded character sets, or character > encodings, to fall into the class of character sets for which the > algorithm works. I think Michael is putting the cart before the horse. Character encodings on which the algorithm fails *must* be excluded. Excluding a category which has no known members should be a no-brainer: we don't have IANA encodings which keep "<?xml version="1.0" encoding=" in their ASCII positions but swap around other ASCII character positions. The only encoding issues I am aware of that are remotely close to causing any funnies are UTF-5 (which might fail but wouldn't be incorrectly diagnosed) and Japanese variant character sets (i.e. that use / for Yen) which just need to be labelled correctly. But there are none known in which the correct code sequence labelling one encoding shares the same bytes as the byte code sequence of the header for another encoding. So there should be no guesswork because logically possible shadowing encodings should be excluded. And even then, there should be no guesswork, because excluding mythical things from consideration does not mean a thing guesswork. It is possible that there could be a race of giants somewhere, but none are known. Cheers Rick Jelliffe
Received on Friday, 28 March 2003 00:57:52 UTC