W3C home > Mailing lists > Public > www-validator@w3.org > September 2014

Re: UTF-8 Errors on file upload, not by URI

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Thu, 11 Sep 2014 11:11:01 +0300
Message-ID: <54115915.3040607@cs.tut.fi>
To: "Kessler,Nathan" <kesslern@oclc.org>, "www-validator@w3.org" <www-validator@w3.org>
2014-09-09 22:40, Kessler,Nathan wrote:

> I'm trying to validate http://worldcat.org <http://worldcat.org.>. If I
> run a scan by URI or by direct input, the scan runs as expected.

The validator reports 53 errors and 22 warnings.

> However, when the HTML source is saved in a file and uploaded, this
> error is reported on line 651:

I saved the page, using Firefox, and tried validation by file upload. 
There is no error reported on line 651 or anywhere close to it.

> "The error was: utf8 "\xED" does not map to Unicode" and the scan
> doesn't run.

This sounds like you saved the page somehow in windows-1252 encoding and 
are then trying to validate it as utf-8 encoded.

> We have an automated system that
> downloads web pages and runs them against our local validator via a file
> upload and this page won't scan due to this error. Is the encoding set
> improperly on the web page?

The encoding of the page appears to be properly declared in an HTTP 
header. So the problem is with the automated system. It seems to change 
the encoding or otherwise mess up the character data.

Yucca
Received on Thursday, 11 September 2014 08:11:32 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:11 UTC