Re: flakey charset detection

David Brownell <david-b@pacbell.net> wrote:

>I wonder if that was my original text?  Sounds familiar.  Note that HTTP
>is the "higher level protocol" in question, and it has a default
>encoding of iso-8859-1 for all "text/*" mime types.

Unfortunately, the HTML WG in their infinite wisdom have made the
contradictory claim (normatively) that no default encoding should be
assumed in the absense of an explicit charcter encoding indication.

[[[
    The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1
    as a default character encoding when the "charset" parameter is
    absent from the "Content-Type" header field. In practice, this
    recommendation has proved useless because some servers don't
    allow a "charset" parameter to be sent, and others may not be
    configured to send the parameter. Therefore, user agents must
    not assume any default value for the "charset" parameter.
]]] -- http://www.w3.org/TR/html401/charset.html#h-5.2.2

Thus a default encoding cannot be assumed for resources served as
text/html. For application/xhtml+xml different rules apply.


-- 
I have to admit that I'm hoping the current situation with regard to XML
Namespaces and W3C XML Schemas is a giant practical joke,   but I see no
signs of pranksters coming forward with a gleeful smile to announce that
they were just kidding.                              -- Simon St.Laurent

Received on Wednesday, 4 December 2002 15:41:02 UTC