W3C home > Mailing lists > Public > www-validator@w3.org > April 2001

Re: XHTML validator doesn't completely support Unicode

From: Bertilo Wennergren <bertilow@bertilo.se.fm>
Date: Mon, 30 Apr 2001 18:06:56 -0500
Message-Id: <200104302306.SAA31601@ns1.svensk-dns.com>
To: www-validator@w3.org
Christian Smith:

> > > > The text file is encoded in UTF-8, and uses the DOS end of line
> > > > conventions, but has the Unicode string "U+FEFF" as the first
> > > > character.

> > > Are you sure it's the end of line characters that give the 
> > > problem? I'd guess it's the BOM ("U+FEFF") that's the culprit.
> > > It's not very common to use a BOM in UTF-8 files. Some even say 
> > > it's not allowed in UTF-8.

> I think the problem is that you are using the wrong BOM. FEFF is the
> UTF-16 BOM whereas the UTF-8 BOM is EF BB BF.

Perhaps, but remember that "U+FEFF" is neutral. It represents the same
character without indicating if it's encoded as UTF-16 BE, UTF-16 LE, 
UTF-8 or something else. It does not mean the byte FE followed by the
byte FF.

-- 
#####################################################################
                         Bertilo Wennergren
                 <http://purl.oclc.org/net/bertilo>
                      <bertilow@bertilo.se.fm>
#####################################################################
Received on Monday, 30 April 2001 12:59:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:58 GMT