Is it a known issue that the w3c validator doesn't properly handle Unicode documents? I've got a page that validates to XHTML 1.0 Strict--until I put the Unicode byte-order mark character string at the beginning of the file. Take a look at http://www.petesguide.com/style/index.html, and then follow the icon link to the validator, and watch what it reports. The text file is encoded in UTF-8, and uses the DOS end of line conventions, but has the Unicode string "U+FEFF" as the first character. This page, http://www.petesguide.com/style/misunderstood.html, also validates against XHTML 1.0 Strict, but the current version posted uses the Unicode Line Separator character instead of the DOS LF/CR pair, and this too, causes the validator to choke. And just for kicks, I point you to this page, http://www.petesguide.com/style/peeves.html, which would also validate, if it weren't for the fact that it uses both the U+FEFF signature and the U+2028 line separator character. Is anyone working on more complete support of Unicode for the validator? -- Peter K. Sheerin Senior Technical Editor, CADENCE magazine psheerin@cmp.com (415) 947-6145Received on Sunday, 29 April 2001 06:21:03 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:58 GMT