XHTML validator doesn't completely support Unicode

Is it a known issue that the w3c validator doesn't properly handle
Unicode documents? I've got a page that validates to XHTML 1.0
Strict--until I put the Unicode byte-order mark character string at the
beginning of the file.
 
Take a look at http://www.petesguide.com/style/index.html, and then
follow the icon link to the validator, and watch what it reports. The
text file is encoded in UTF-8, and uses the DOS end of line conventions,
but has the Unicode string "U+FEFF" as the first character.
 
This page, http://www.petesguide.com/style/misunderstood.html, also
validates against XHTML 1.0 Strict, but the current version posted uses
the Unicode Line Separator character instead of the DOS LF/CR pair, and
this too, causes the validator to choke.
 
And just for kicks, I point you to this page,
http://www.petesguide.com/style/peeves.html, which would also validate,
if it weren't for the fact that it uses both the U+FEFF signature and
the U+2028 line separator character.
 
Is anyone working on more complete support of Unicode for the validator?
 
--
Peter K. Sheerin
Senior Technical Editor, CADENCE magazine
psheerin@cmp.com
(415) 947-6145
 

Received on Sunday, 29 April 2001 06:21:03 UTC