Re: XHTML validator doesn't completely support Unicode

From: Bertilo Wennergren (bertilow@bertilo.se.fm)
Date: Mon, Apr 30 2001

  • Next message: Joseph Eisenberg: "checklink: http://www.nationalgeographic.com/maps/"

    Date: Mon, 30 Apr 2001 18:06:56 -0500
    Message-Id: <200104302306.SAA31601@ns1.svensk-dns.com>
    From: "Bertilo Wennergren" <bertilow@bertilo.se.fm>
    To: www-validator@w3.org
    Subject: Re: XHTML validator doesn't completely support Unicode
    
    Christian Smith:
    
    > > > > The text file is encoded in UTF-8, and uses the DOS end of line
    > > > > conventions, but has the Unicode string "U+FEFF" as the first
    > > > > character.
    
    > > > Are you sure it's the end of line characters that give the 
    > > > problem? I'd guess it's the BOM ("U+FEFF") that's the culprit.
    > > > It's not very common to use a BOM in UTF-8 files. Some even say 
    > > > it's not allowed in UTF-8.
    
    > I think the problem is that you are using the wrong BOM. FEFF is the
    > UTF-16 BOM whereas the UTF-8 BOM is EF BB BF.
    
    Perhaps, but remember that "U+FEFF" is neutral. It represents the same
    character without indicating if it's encoded as UTF-16 BE, UTF-16 LE, 
    UTF-8 or something else. It does not mean the byte FE followed by the
    byte FF.
    
    -- 
    #####################################################################
                             Bertilo Wennergren
                     <http://purl.oclc.org/net/bertilo>
                          <bertilow@bertilo.se.fm>
    #####################################################################