UTF-8 Errors on file upload, not by URI

I'm trying to validate http://worldcat.org<http://worldcat.org.>. If I run a scan by URI or by direct input, the scan runs as expected. However, when the HTML source is saved in a file and uploaded, this error is reported on line 651:

"The error was: utf8 "\xED" does not map to Unicode" and the scan doesn't run.


The specific character in question: http://www.fileformat.info/info/unicode/char/ed/index.htm


If this character is removed, it fails on the fancy character in "traducción" -- it's not just the character above. The encoding of the page is UTF-8 and it is saved as UTF-8 before being uploaded. The scan works when the encoding is set to UTF-16, but not when it reads UTF-8 from the HTML.


Can anyone provide any advice here? We have an automated system that downloads web pages and runs them against our local validator via a file upload and this page won't scan due to this error. Is the encoding set improperly on the web page? Am I missing something else here?


Thanks for your time and work,

Nathan Kessler

Received on Wednesday, 10 September 2014 21:12:51 UTC