XHTML validator doesn't completely support Unicode from Peter Sheerin on 2001-04-27 (www-validator@w3.org from April 2001)

From: Peter Sheerin <psheerin@cmp.com>
Date: Thu, 26 Apr 2001 21:50:36 -0400 (EDT)
To: <www-validator@w3.org>
Message-ID: <000001c0cebc$493de320$8810960a@cadencesheerin>

Is it a known issue that the w3c validator doesn't properly handle
Unicode documents? I've got a page that validates to XHTML 1.0
Strict--until I put the Unicode byte-order mark character string at the
beginning of the file.
 
Take a look at http://www.petesguide.com/style/index.html, and then
follow the icon link to the validator, and watch what it reports. The
text file is encoded in UTF-8, and uses the DOS end of line conventions,
but has the Unicode string "U+FEFF" as the first character.
 
This page, http://www.petesguide.com/style/misunderstood.html, also
validates against XHTML 1.0 Strict, but the current version posted uses
the Unicode Line Separator character instead of the DOS LF/CR pair, and
this too, causes the validator to choke.
 
And just for kicks, I point you to this page,
http://www.petesguide.com/style/peeves.html, which would also validate,
if it weren't for the fact that it uses both the U+FEFF signature and
the U+2028 line separator character.
 
Is anyone working on more complete support of Unicode for the validator?
 
--
Peter K. Sheerin
Senior Technical Editor, CADENCE magazine
psheerin@cmp.com
(415) 947-6145

Received on Sunday, 29 April 2001 06:21:03 UTC