Re: charset parameter

From: Bjoern Hoehrmann (derhoermi@gmx.net)
Date: Fri, Jul 27 2001

  • Next message: Martin Duerst: "Re: charset parameter"

    From: Bjoern Hoehrmann <derhoermi@gmx.net>
    To: Terje Bless <link@pobox.com>
    Cc: W3C Validator <www-validator@w3.org>
    Date: Sat, 28 Jul 2001 00:09:44 +0200
    Message-ID: <0qn3mtsd1kk6mu2i8oc4j19u8cticdfj2i@4ax.com>
    Subject: Re: charset parameter
    
    * Terje Bless wrote:
    >>A conforming HTML user agent must adhere to all "must"s in the HTML 4
    >>recommendation. Assuming no default value for the charset parameter is a
    >>must. Applications that do something different, i.e. assuming some default
    >>value or don't check if an explicit charset was given, aren't conforming
    >>user agents.
    >
    >You fail to distinguish between a "HTNL 4 User Agent" and a "HTTP Client
    >Application".
    
    Not I fail, HTML 4 fails and it fails for a good reason with a bad
    solution. If I had written that section, I think I'd recommended that,
    if the document contains nothing but valid UTF-8 sequences, treat it as
    UTF-8; otherwise treat it as ISO-8859-1 as HTTP/1.1 demands. Assuming
    nothing and parse the document doesn't work, it's just nonsense.
    
    >At least the XML Rec. seems to have solved some of my problems for XML; it
    >describes fairly well the expected behaviour when faced with various
    >encoding variants and labellings.
    
    Not so. E.g. XML 1.0 Second Edition reads in section 4.3.3.
    
      "[...] It is also a fatal error if an XML entity contains no encoding
       declaration and its content is not legal UTF-8 or UTF-16."
    
    That doesn't take higher level protocol information into account while
    it should. John Cowan said on xml-dev regarding this issue "I personally
    expect that the Core WG will act on it soon in the direction suggested".
    Not that I like that...
    
    >Björn, Nick, Martin (and anyone else with an opinion ;D)[0]: could you take
    >a look at the pseudo-algorithm I posted the other day and tell me of any
    >problems you see with it? What _exactly_ would you say is the "correct"
    >behaviour for the Validator? Did I leave out anything?
    
    I'll have a look, it's just another unread article ;-)
    
    Btw., ISO/IEC 8859-16:2001 has just been published; we should add
    support to the validator.
    -- 
    Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
    am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
    25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/