W3C home > Mailing lists > Public > www-validator@w3.org > September 2014

UTF-8 Errors on file upload, not by URI

From: Kessler,Nathan <kesslern@oclc.org>
Date: Tue, 9 Sep 2014 19:40:45 +0000
To: "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <1410291644706.96048@oclc.org>
I'm trying to validate http://worldcat.org<http://worldcat.org.>. If I run a scan by URI or by direct input, the scan runs as expected. However, when the HTML source is saved in a file and uploaded, this error is reported on line 651:

"The error was: utf8 "\xED" does not map to Unicode" and the scan doesn't run.

The specific character in question: http://www.fileformat.info/info/unicode/char/ed/index.htm

If this character is removed, it fails on the fancy character in "traducción" -- it's not just the character above. The encoding of the page is UTF-8 and it is saved as UTF-8 before being uploaded. The scan works when the encoding is set to UTF-16, but not when it reads UTF-8 from the HTML.

Can anyone provide any advice here? We have an automated system that downloads web pages and runs them against our local validator via a file upload and this page won't scan due to this error. Is the encoding set improperly on the web page? Am I missing something else here?

Thanks for your time and work,

Nathan Kessler
Received on Wednesday, 10 September 2014 21:12:51 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:11 UTC