Re: Comments on Part 1: Encoding declaration

>>I would say that things like HTTP charset labelling should override
>>all of the above. Is that the intent of this?
>
>At least, encoding declarations inherited from an upper-level external 
>entity should not be overriden by HTTP charset labelling.  Since the server 
>is not likely to keep track of the entity reference relationship, it 
>cannot make an educated guess.

I think you are wrong, and let me give you some examples of why.

CASE 1:
  I have three entities across the web:

     <!ENTITY a SYSTEM "http://usa.com/foo.xml"> 
     <!ENTITY b SYSTEM "http://nihon.com/bar.xml">
     <!ENTITY c SYSTEM "http://china.com/baz.xml">

  My document entity says:

     <?XML VERSION="1.0" ENCODING="UTF8">
     &a;&b;&c;

  the other three entities do not have an XML declaration on them.
  Entity a is in ASCII. Entity b is in shift-jis. Entity c is in BIG5.

  What happens if you don't rely on correct server labelling?

CASE 2:

  Entities a, b, and c do have their XML declarations in place, and
  servers do correctly label the content. However, all your requests
  go through a proxy server, that just happens to be transcoding
  everything to UTF8. Unless the proxy *rewrites* the XML declaration,
  your strategy will fail (such proxies can rewrite headers).

In a global system, where you want to include arbitrary entities from
arbitrary locations, and do not wish to *force* people to form
agreements on encodings, server labelling is the only thing that works
*and* has reasonable error recover/detection.

I would say to apply the heuristics *if* no outside encoding
specification is applied, but in cases where it is available (and
these will become more common), that should override everything.

Received on Wednesday, 4 June 1997 11:24:53 UTC