- From: Peter Sheerin <psheerin@cmp.com>
- Date: Thu, 26 Apr 2001 21:50:36 -0400 (EDT)
- To: <www-validator@w3.org>
- Message-ID: <000001c0cebc$493de320$8810960a@cadencesheerin>
Is it a known issue that the w3c validator doesn't properly handle Unicode documents? I've got a page that validates to XHTML 1.0 Strict--until I put the Unicode byte-order mark character string at the beginning of the file. Take a look at http://www.petesguide.com/style/index.html, and then follow the icon link to the validator, and watch what it reports. The text file is encoded in UTF-8, and uses the DOS end of line conventions, but has the Unicode string "U+FEFF" as the first character. This page, http://www.petesguide.com/style/misunderstood.html, also validates against XHTML 1.0 Strict, but the current version posted uses the Unicode Line Separator character instead of the DOS LF/CR pair, and this too, causes the validator to choke. And just for kicks, I point you to this page, http://www.petesguide.com/style/peeves.html, which would also validate, if it weren't for the fact that it uses both the U+FEFF signature and the U+2028 line separator character. Is anyone working on more complete support of Unicode for the validator? -- Peter K. Sheerin Senior Technical Editor, CADENCE magazine psheerin@cmp.com (415) 947-6145
Received on Sunday, 29 April 2001 06:21:03 UTC