Re: Reads like ASCII (was Re: character sets ...)
>> This is a hack, and doesn't help with *initial* parsing of the
>why? what is *initial* parsing?
The PI is part of the entity. The parser will *parse* it, which it
cannot do *blindly*, so it can't parse this correctly without some way
of priming it with the knowledge.
You submit that having a PI at the start of every entity will solve
the problem. For a limited set of encodings that is true, in the
actual infinite set, I would not guarantee that it will be true.
In addition, the PI is effectively a kind of header that the storage
manager will be using. If that is the case, why not define a proper
header syntax instead of a hack? I would prefer
Content-Type: text/xml; charset=shift-jis<CR><LF>
any day. Also, the PI idea would lead to people trying something like:
which is obviously broken (or at least hard to work with).
HTML has a similar problem with META.