Re: XHTML validator doesn't completely support Unicode

From: Christian Smith (csmith@barebones.com)
Date: Mon, Apr 30 2001

  • Next message: Bertilo Wennergren: "Re: XHTML validator doesn't completely support Unicode"

    Date: Mon, 30 Apr 2001 12:12:24 -0400
    From: Christian Smith <csmith@barebones.com>
    To: www-validator@w3.org
    cc: Bertilo Wennergren <bertilow@bertilo.se.fm>, psheerin@cmp.com, Liam Quinn <liam@htmlhelp.com>
    Message-ID: <20010430121224-r01010600-ac0549ea@204.107.232.107>
    Subject: Re: XHTML validator doesn't completely support Unicode
    
    On Sunday, April 29, 2001 at 21:33, liam@htmlhelp.com (Liam Quinn) wrote:
    
    > On Sun, 29 Apr 2001, Bertilo Wennergren wrote:
    > 
    > > Peter Sheerin:
    > >
    > > > Take a look at http://www.petesguide.com/style/index.html, and then
    > > > follow the icon link to the validator, and watch what it reports. The
    > > > text file is encoded in UTF-8, and uses the DOS end of line
    > > > conventions, but has the Unicode string "U+FEFF" as the first character.
    > >
    > > Are you sure it's the end of line characters that give the problem?
    > >
    > > I'd guess it's the BOM ("U+FEFF") that's the culprit. It's not very
    > > common to use a BOM in UTF-8 files. Some even say it's not allowed
    > > in UTF-8.
    
    I think the problem is that you are using the wrong BOM. FEFF is the
    UTF-16 BOM whereas the UTF-8 BOM is EF BB BF.
    
    > According to <http://www.unicode.org/unicode/faq/utf_bom.html#25>, the BOM
    > is allowed in UTF-8.  Strange that the UTF-8 RFC makes no mention of it
    > though.
    
    I think it does in a round about sort of way. If you encode the UTF-16 BOM
    via the UTF-8 encoding method don't you get the UTF-8 BOM?
    
    -- 
    Christian Smith  |  csmith@barebones.com  |  http://web.barebones.com
    
    He who dies with the most friends... Is still dead!