- From: Gavin Nicol <gtn@eps.inso.com>
- Date: Wed, 4 Jun 1997 11:24:03 -0400
- To: murata@apsdc.ksp.fujixerox.co.jp
- CC: w3c-sgml-wg@w3.org
>>I would say that things like HTTP charset labelling should override
>>all of the above. Is that the intent of this?
>
>At least, encoding declarations inherited from an upper-level external
>entity should not be overriden by HTTP charset labelling. Since the server
>is not likely to keep track of the entity reference relationship, it
>cannot make an educated guess.
I think you are wrong, and let me give you some examples of why.
CASE 1:
I have three entities across the web:
<!ENTITY a SYSTEM "http://usa.com/foo.xml">
<!ENTITY b SYSTEM "http://nihon.com/bar.xml">
<!ENTITY c SYSTEM "http://china.com/baz.xml">
My document entity says:
<?XML VERSION="1.0" ENCODING="UTF8">
&a;&b;&c;
the other three entities do not have an XML declaration on them.
Entity a is in ASCII. Entity b is in shift-jis. Entity c is in BIG5.
What happens if you don't rely on correct server labelling?
CASE 2:
Entities a, b, and c do have their XML declarations in place, and
servers do correctly label the content. However, all your requests
go through a proxy server, that just happens to be transcoding
everything to UTF8. Unless the proxy *rewrites* the XML declaration,
your strategy will fail (such proxies can rewrite headers).
In a global system, where you want to include arbitrary entities from
arbitrary locations, and do not wish to *force* people to form
agreements on encodings, server labelling is the only thing that works
*and* has reasonable error recover/detection.
I would say to apply the heuristics *if* no outside encoding
specification is applied, but in cases where it is available (and
these will become more common), that should override everything.
Received on Wednesday, 4 June 1997 11:24:53 UTC