[Prev][Next][Index][Thread]

Re: Reads like ASCII (was Re: character sets ...)



>> This is a hack, and doesn't help with *initial* parsing of the
>> document.
>
>why?  what is *initial* parsing? 

The PI is part of the entity. The parser will *parse* it, which it
cannot do *blindly*, so it can't parse this correctly without some way
of priming it with the knowledge. 

You submit that having a PI at the start of every entity will solve
the problem. For a limited set of encodings that is true, in the
actual infinite set, I would not guarantee that it will be true.

In addition, the PI is effectively a kind of header that the storage
manager will be using. If that is the case, why not define a proper
header syntax instead of a hack? I would prefer

  Content-Type: text/xml; charset=shift-jis<CR><LF>
  <CR><LF>
  [data]

to

  <?XML-ENCODING SHIFT-JIS>

any day. Also, the PI idea would lead to people trying something like:

  <?XML-ENCODING SHIFT-JIS>
  [some data>
  <?XML-ENCODING UCS2>

which is obviously broken (or at least hard to work with).
HTML has a similar problem with META.


Follow-Ups: References: