W3C home > Mailing lists > Public > www-html@w3.org > October 2000

RE: Sniffing XHTML sent as text/html

From: Albert Lunde <Albert-Lunde@northwestern.edu>
Date: Tue, 3 Oct 2000 00:01:34 -0400 (EDT)
Message-Id: <200010030401.XAA11109@nuinfo.northwestern.edu>
To: jelks@jelks.nu
Cc: www-html@w3.org
>     The use of an explicit charset parameter is strongly recommended.
>     While [MIME] specifies "The default character set, which must be
>     assumed in the absence of a charset parameter, is US-ASCII."  [HTTP]
>     Section 3.7.1, defines that "media subtypes of the 'text' type are
>     defined to have a default charset value of 'ISO-8859-1'".  Section
>     19.3 of [HTTP] gives additional guidelines.  Using an explicit
>     charset parameter will help avoid confusion.
> 
>     Using an explicit charset parameter also takes into account that the
>     overwhelming majority of deployed browsers are set to use something
>     else than 'ISO-8859-1' as the default; the actual default is either a
>     corporate character encoding or character encodings widely deployed
>     in a certain national or regional community. For further
>     considerations, please also see Section 5.2 of [HTML40].
> 
> 
> This should perhaps also mention that XHTML documents are (being XML) by
> default UTF-8 if you omit the XML declaration.  How that is reconciled with
> text/* defaulting to ISO-8859-1, I'm not sure.  Perhaps it's a further
> indication that text/html may be unsuitable for XHTML.

The different defaults of the various specifications are perfectly
reasonable given the historical contexts in which they were written.

I'd read it as saying that you can't use both defaults at once: you
have to specify the character encoding in either the XML or the HTTP
headers (and if you specify it in both places it should be consistent).

(And remember that HTTP is not quite MIME.)

(Though personally, I think the whole idea of having a document
specify its character encoding is a bit fishy: it really only works
well because so many encodings are supersets of US-ASCII: if faced
with something more strange like CDC "Display Code" the huristics
used would probably run off the rails.)

--
    Albert Lunde          Albert-Lunde@northwestern.edu (new address)
                          Albert-Lunde@nwu.edu (old address)
Received on Tuesday, 3 October 2000 00:17:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:44 GMT