PI->MIME->PI (was Re: Comments on Part 1: Encoding declaration) from Rick Jelliffe on 1997-06-07 (w3c-sgml-wg@w3.org from June 1997)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Sun, 8 Jun 1997 04:29:42 +1000
To: <w3c-sgml-wg@w3.org>
Message-Id: <199706071829.EAA08092@jawa.chilli.net.au>

> From: Gavin Nicol <gtn@eps.inso.com>

> I would say to apply the heuristics *if* no outside encoding
> specification is applied, but in cases where it is available (and
> these will become more common), that should override everything.

Yes, Gavin's examples are very compelling, especially his point 
about documents composed of external entities in different charsets 
(CASE 1): a client should prefer external encoding specifications 
to inherited encoding specifications. I don't think my proposal & 
the Japanese one handles that case, which is an important one.

I have previously made the comment that XML generators should be
quick to stick in BOM and encoding PIs, but that transmitters and
clients should err on the side of caution and remove them if there is 
any doubt (or duplication).

The best place for this to occur may well be at the server, to use and
strip the charset pseudo-attribute in the encoding PIs into the charset
parameter of the MIME header (and then reintroduced if the file is saved
to disk at the client end).   PI -> MIME -> PI

The reason for the encoding PI was that XML documents will have a life
before and after transmission: not all of them will be generated on the 
fly.   I think Gavin has provided a good argument why MIME charset 
may be preferable to encoding PI's charset for transmission. 

The chain of detection should be for servers and when the MIME header 
is incomplete.  

But lets not complicate things. Lets just make the 1.0 draft 
just have the chain of detection for a local file, and leave it up to 
implementers of servers and clients to figure out a policy for
making use of this information w.r.t. http.  XML is a markup 
language, not a high-level networking protocol.


Rick Jelliffe

Received on Saturday, 7 June 1997 14:29:33 UTC