- From: Gavin Nicol <gtn@eps.inso.com>
- Date: Wed, 4 Jun 1997 11:24:03 -0400
- To: murata@apsdc.ksp.fujixerox.co.jp
- CC: w3c-sgml-wg@w3.org
>>I would say that things like HTTP charset labelling should override >>all of the above. Is that the intent of this? > >At least, encoding declarations inherited from an upper-level external >entity should not be overriden by HTTP charset labelling. Since the server >is not likely to keep track of the entity reference relationship, it >cannot make an educated guess. I think you are wrong, and let me give you some examples of why. CASE 1: I have three entities across the web: <!ENTITY a SYSTEM "http://usa.com/foo.xml"> <!ENTITY b SYSTEM "http://nihon.com/bar.xml"> <!ENTITY c SYSTEM "http://china.com/baz.xml"> My document entity says: <?XML VERSION="1.0" ENCODING="UTF8"> &a;&b;&c; the other three entities do not have an XML declaration on them. Entity a is in ASCII. Entity b is in shift-jis. Entity c is in BIG5. What happens if you don't rely on correct server labelling? CASE 2: Entities a, b, and c do have their XML declarations in place, and servers do correctly label the content. However, all your requests go through a proxy server, that just happens to be transcoding everything to UTF8. Unless the proxy *rewrites* the XML declaration, your strategy will fail (such proxies can rewrite headers). In a global system, where you want to include arbitrary entities from arbitrary locations, and do not wish to *force* people to form agreements on encodings, server labelling is the only thing that works *and* has reasonable error recover/detection. I would say to apply the heuristics *if* no outside encoding specification is applied, but in cases where it is available (and these will become more common), that should override everything.
Received on Wednesday, 4 June 1997 11:24:53 UTC