- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Fri, 20 Sep 1996 02:22:52 +1000 (EST)
- To: Gavin Nicol <gtn@ebt.com>
- Cc: w3c-sgml-wg@w3.org
(repost due to finger trouble) On Wed, 18 Sep 1996, Gavin Nicol wrote: > This is my point. You *cannot* read the entity in unless you know the > coded character set and encoding. It seems to me there are three basic data formats which character encodings use: eight bits (8-bit fixed and 8bit variable), 16-bits (fixed and variable) big-endian and 16-bit (fixed and variable) little-endian. If you include fixed 32-bit character formats you only add another 3 (Intel order endian, Motorola order-endian, PDP11 order-endian). Can you give me any examples of any character set encodings in use (not compression, UUENCODE, etc) in which you can't reliably establish the data format used (for coded character sets which have ASCII characters in the ASCII code positions) if the first string in the file is "<?XML" ? Once one can establish the data format, one can read the PI and get the charset/encoding in use. (I.e. this is not autodetecting the character set, nor the encoding, but merely the basic data format {of the initially appearing ASCII-valued characters}. If that is such a 'hack' why does Unicode sepcifically have the byte-ordering mark characters to allow it?) Rick Jelliffe http://www.allette.com.au/allette/ricko email: ricko@allette.com.au ================================================================ Allette Systems http://www.allette.com.au email: info@allette.com.au 10/91 York St, 2000, phone: +61 2 9262 4777 Sydney, Australia fax: +61 2 9262 4774 ================================================================
Received on Thursday, 19 September 1996 13:40:31 UTC