- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Tue, 10 Jun 1997 02:17:11 +1000
- To: <w3c-sgml-wg@w3.org>
> From: Gavin Nicol <gtn@eps.inso.com> ------------------------------------------------------------------------------------------------------------------- > I can accept the ENCODING parameter on the XML declaration as being of > *informative* value, but if you have anything more reliable to use, it > should be given priority. We have been talking an either/or choice along our decision tree. But there is another possibility: the encoding PI and the charset parameter (and locale and user preferences) are just a priority list for autodetection. In the usual case, there should be agreement between the encoding PI and the charset parameter, I'd hope (since Gavin assures us of the future excellence of servers in this regard :-) . I don't really like this, because I think we need to be clearer. It is a difficult problem and it deserves good attention. ---------------------------------------------------------------------------------------------------------- > Maybe we should just require that XML *always* be in utf8? (I diasgree > on a personal level, but from one viewpoint, this has a lot in it's > favor). Is that you suggesting this Gavin? It would be nice if this were possible, for XML 10.0. ------------------------------------------------------------------------------------------------------------ Without wishing to be too tedious, there are several different models, each leading to different results: PSEUDO-GAVIN Let me invent a person called Pseudo-Gavin. He sees the need in these kind of terms: 1) we need a way for a document's encoding to be known by a server; & 2) we need a way for a document's encoding to be known by a client. Number 2) is already handled by MIME. Number 1) is better handled by system dependent methods at the server end, ideally using MIME format. PSEUDO-MAKOTOSAN Let me invent another person called Pseudo-Makotosan. He sees the need in these terms: 1) there should only be one primary method for a document to describe itself; other methods are only in case of failure. PIs are the only way to do this. PSEUDO-RICKO Let me introduce Pseudo-Ricko (? Is this what they call "reinventing yourself" ?) He thinks: 1) "horses for courses":where there is a reliable system-specific way to store, transmit or maintain character encodings, that way is to be preferred, since it will make the document integrate better into that system; 2) where there is no reliable system-specific way to store and maintain character encoding, then the PI must be used; This means: * an http client should prefer MIME to PIs for received XML documents; * a UNIX http server must use PIs because its files are undecoratable; * a Macintosh http server should prefer PIs rather than charset data in the resource fork, because a simple file transfer from another OS will maintain the PI, but maybe won't set the resource fork correctly; * a stream editor using UNIX pipes should have XML documents with PIs; PSEUDO-RAVIN Here is another fiction, Pseudo-Ravin. He thinks: 1) PIs are only reliable if there is smart transcoding (to rewrite the PI); 2) MIME is only reliable if there is smart transcoding (to rewrite the MIME charset); 3) http servers shouldn't invent a character encoding if the PI is available; 4) http clients shouldn't use something else if MIME charset is available; 5) unthinking transcoding without altering the MIME or the PI will always stuff things up: the issue for us is not "how to prevent stuff-ups" but "how to allow reliablility"; and 6) an http server should rewrite the charset pseudo-attribute if it transcodes the file; an http server should rewrite the charset pseudo-attribute if it transcodes the file; so should an intermediate proxy. PSEUDO-DRACO Finally, a spector of Draco appears: 1) If an http client finds a file with a different MIME charset to its PI, then there has been some dumb processing going on, and the file must be regarded as suspect, and therefore killed. 2) This is really a problem of maintaining and verifying the integrity of data across uncontrolled systems. So XML files are binary, not text. Rick Jelliffe
Received on Monday, 9 June 1997 12:25:51 UTC