- From: Gavin Nicol <gtn@ebt.com>
- Date: Mon, 16 Sep 1996 15:02:03 GMT
- To: ricko@allette.com.au
- CC: tbray@textuality.com, w3c-sgml-wg@w3.org
>1) the first thing in the document before any non-ISO 646 characters is a >PI with only ISO 646 characters that can say the encoding (if it is exotic >or warranted). E.g.: > <?XML EUC-JP> >2) the encoding used for the input stream must have ISO 646 characters >in the same code numbers as ISO 646. This is a hack, and doesn't help with *initial* parsing of the document. Autodetection also fails very quickly when faces with a number of multibyte encodings. The only *correct* way to indicate the encoding (or BCTF) of a document is to do so external to the document. To me, this means FSI's *or* MIME labelling (the *.mim file format). So far, all of the proposal I have seen could be easily handled by the *.mim file format, in which case, no parser trickery is needed: the storage manager would always unambiguously know what the encoding is.
Received on Monday, 16 September 1996 11:03:42 UTC