W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > June 1997

Re: Comments on Part 1: Encoding declaration

From: Gavin Nicol <gtn@eps.inso.com>
Date: Wed, 4 Jun 1997 11:24:03 -0400
Message-Id: <199706041524.LAA08366@nathaniel.ebt>
To: murata@apsdc.ksp.fujixerox.co.jp
CC: w3c-sgml-wg@w3.org
>>I would say that things like HTTP charset labelling should override
>>all of the above. Is that the intent of this?
>At least, encoding declarations inherited from an upper-level external 
>entity should not be overriden by HTTP charset labelling.  Since the server 
>is not likely to keep track of the entity reference relationship, it 
>cannot make an educated guess.

I think you are wrong, and let me give you some examples of why.

  I have three entities across the web:

     <!ENTITY a SYSTEM "http://usa.com/foo.xml"> 
     <!ENTITY b SYSTEM "http://nihon.com/bar.xml">
     <!ENTITY c SYSTEM "http://china.com/baz.xml">

  My document entity says:


  the other three entities do not have an XML declaration on them.
  Entity a is in ASCII. Entity b is in shift-jis. Entity c is in BIG5.

  What happens if you don't rely on correct server labelling?


  Entities a, b, and c do have their XML declarations in place, and
  servers do correctly label the content. However, all your requests
  go through a proxy server, that just happens to be transcoding
  everything to UTF8. Unless the proxy *rewrites* the XML declaration,
  your strategy will fail (such proxies can rewrite headers).

In a global system, where you want to include arbitrary entities from
arbitrary locations, and do not wish to *force* people to form
agreements on encodings, server labelling is the only thing that works
*and* has reasonable error recover/detection.

I would say to apply the heuristics *if* no outside encoding
specification is applied, but in cases where it is available (and
these will become more common), that should override everything.
Received on Wednesday, 4 June 1997 11:24:53 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:10 UTC