W3C home > Mailing lists > Public > www-validator@w3.org > December 2005

Re: Document without charset

From: Jirka Kosek <jirka@kosek.cz>
Date: Thu, 08 Dec 2005 23:13:17 +0100
Message-ID: <4398AFFD.7010007@kosek.cz>
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: www-validator@w3.org
Jukka K. Korpela wrote:

> Apparently the validator uses UTF-8 as the implied default.
> The choice is impractical

I can't recall RFC number from the top of my head, but HTTP protocol 
assumes ISO-8859-1 for all text/* media types as a default. So it is no 
"impractical", it is clearly bug.

That's why text/xml was superseded by application/xml where is no such 
default assumed. If there were no charset parameter, ISO-8859-1 should 
be assumed from HTTP point of view, but XML document without XML 
declaration assumes UTF-8 or UTF-16. However HTTP takes precedence and 
you are decoding XML content with a wrong encoding assumption. Not good. 
It sounds silly to serve XML with other content type then text/*, but 
legacy is legacy :-(


   Jirka Kosek     e-mail: jirka@kosek.cz     http://www.kosek.cz
   Profesionální školení a poradenství v oblasti technologií XML.
      Podívejte se na náš nově spuštěný web http://DocBook.cz
        Podrobný přehled školení http://xmlguru.cz/skoleni/
                    Nejbližší termíny školení:
      ** XSLT 13.-16.3.2006 ** XML schémata 24.-26.4.2006 **
        ** DocBook 15.-17.5.2006 ** XSL-FO 12.-13.6.2006 **

Received on Thursday, 8 December 2005 22:13:34 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:47 UTC