Re: Reads like ASCII (was Re: character sets ...) from Gavin Nicol on 1996-09-16 (w3c-sgml-wg@w3.org from September 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Mon, 16 Sep 1996 20:08:31 GMT
To: ricko@allette.com.au
CC: tbray@textuality.com, w3c-sgml-wg@w3.org
Message-Id: <199609162008.UAA15894@wiley.EBT.COM>

>> This is a hack, and doesn't help with *initial* parsing of the
>> document.
>
>why?  what is *initial* parsing? 

The PI is part of the entity. The parser will *parse* it, which it
cannot do *blindly*, so it can't parse this correctly without some way
of priming it with the knowledge. 

You submit that having a PI at the start of every entity will solve
the problem. For a limited set of encodings that is true, in the
actual infinite set, I would not guarantee that it will be true.

In addition, the PI is effectively a kind of header that the storage
manager will be using. If that is the case, why not define a proper
header syntax instead of a hack? I would prefer

  Content-Type: text/xml; charset=shift-jis<CR><LF>
  <CR><LF>
  [data]

to

  <?XML-ENCODING SHIFT-JIS>

any day. Also, the PI idea would lead to people trying something like:

  <?XML-ENCODING SHIFT-JIS>
  [some data>
  <?XML-ENCODING UCS2>

which is obviously broken (or at least hard to work with).
HTML has a similar problem with META.

Received on Monday, 16 September 1996 16:10:20 UTC