Re: Reads like ASCII (was Re: character sets ...)
>1) the first thing in the document before any non-ISO 646 characters is a
>PI with only ISO 646 characters that can say the encoding (if it is exotic
>or warranted). E.g.:
> <?XML EUC-JP>
>2) the encoding used for the input stream must have ISO 646 characters
>in the same code numbers as ISO 646.
This is a hack, and doesn't help with *initial* parsing of the
Autodetection also fails very quickly when faces with a number of
The only *correct* way to indicate the encoding (or BCTF) of a
document is to do so external to the document. To me, this means FSI's
*or* MIME labelling (the *.mim file format).
So far, all of the proposal I have seen could be easily handled by the
*.mim file format, in which case, no parser trickery is needed: the
storage manager would always unambiguously know what the encoding is.