- From: Gavin Nicol <gtn@eps.inso.com>
- Date: Mon, 9 Jun 1997 14:52:38 -0400
- To: w3c-sgml-wg@w3.org
>> I can accept the ENCODING parameter on the XML declaration as being of >> *informative* value, but if you have anything more reliable to use, it >> should be given priority. > >We have been talking an either/or choice along our decision tree. But there >is another possibility: the encoding PI and the charset parameter (and locale >and user preferences) are just a priority list for autodetection. I don't like word "autodetection" in the sentence, and would prefer "determination". In other words, unless I can *know*, with a reasonable degree of certainty, what the encoding is, I consider the system broken. >> Maybe we should just require that XML *always* be in utf8? (I diasgree >> on a personal level, but from one viewpoint, this has a lot in it's >> favor). > >Is that you suggesting this Gavin? It would be nice if this were >possible, for XML 10.0. Guilty as charged. As you and I know, there are many reasons why this is still not possible.... >PSEUDO-GAVIN > >1) we need a way for a document's encoding to be known by a server; & >2) we need a way for a document's encoding to be known by a client. > >Number 2) is already handled by MIME. Number 1) is better handled by >system dependent methods at the server end, ideally using MIME format. You got the stance right, without most of the justification, unfortunately. >PSEUDO-MAKOTOSAN > >1) there should only be one primary method for a document to describe >itself; other methods are only in case of failure. PIs are the only way >to do this. I think his stance is a bit further afield than that. Seems like they want all kinds of autodetection in there. >PSEUDO-RICKO > >1) "horses for courses":where there is a reliable system-specific way to >store, transmit or maintain character encodings, that way is to be preferred, >since it will make the document integrate better into that system; > >2) where there is no reliable system-specific way to store and maintain >character encoding, then the PI must be used; I have no argument against this, and indeed, this is very close to the start of my thought process. >This means: > >* an http client should prefer MIME to PIs for received XML documents; >* a UNIX http server must use PIs because its files are undecoratable; >* a Macintosh http server should prefer PIs rather than charset data in >the resource fork, because a simple file transfer from another OS will >maintain the PI, but maybe won't set the resource fork correctly; >* a stream editor using UNIX pipes should have XML documents with >PIs; This is where we diverge. I would like to remove the restrictions on all these systems, rather than adjust to them. >PSEUDO-DRACO > >Finally, a spector of Draco appears: > >1) If an http client finds a file with a different MIME charset to > its PI, then there has been some dumb processing going on, and the > file must be regarded as suspect, and therefore killed. >2) This is really a problem of maintaining and verifying the > integrity of data across uncontrolled systems. So XML files are > binary, not text. I think position (1) is reasonable, and this is a reportable error in XML today.
Received on Monday, 9 June 1997 14:53:22 UTC