[Bug 10174] Bogus error reported for UTF-8 characters in larger documents

http://www.w3.org/Bugs/Public/show_bug.cgi?id=10174

--- Comment #4 from Ville Skyttä <ville.skytta@iki.fi> 2011-10-27 20:55:59 UTC ---
Validator does not use the form-based file upload interface of validator.nu; it
POSTs the document as the request entity body:
http://wiki.whatwg.org/wiki/Validator.nu_POST_Body_Input

This has something to do with whether the request is gzipped or not.  It seems
to always work if it is gzipped, but not if it isn't.  Reproducing on
qa-dev.w3.org (validator perl code uses out=xml, out=gnu is here for
readability):

-------------
$ curl --data-binary @utf-8-validation.html -H "Content-Type: text/html"
"http://localhost:8888/?out=gnu"
: info: The Content-Type was “text/html”. Using the HTML parser.
: info: Using the schema for HTML5+ARIA, SVG 1.1 plus MathML 2.0
(experimental).
:1254.13-1254.13: error: End of file seen and there were open elements.
:1254.5-1254.10: error: Unclosed element “span”.
:11.3-11.7: error: Unclosed element “div”.
-------------

...and the same gzipped:

-------------
$ curl --data-binary @utf-8-validation.html.gz -H "Content-Type: text/html" -H
"Content-Encoding: gzip" "http://localhost:8888/?out=gnu"
: info: The Content-Type was “text/html”. Using the HTML parser.
: info: Using the schema for HTML5+ARIA, SVG 1.1 plus MathML 2.0
(experimental).
-------------

If I use validator.nu instead of localhost:8888 on qa-dev.w3.org and otherwise
the same curl commands as above, neither gzipped nor non-gzipped produce any
errors, so as noted in comment 1, it seems to have something to do with the
qa-dev.w3.org local HTML5 validator instance (ditto probably validator.w3.org).

Whether validator's perl code gzips the request or not depends on whether debug
mode is enabled, and whether the HTML5 validator seems to be on the same host
as the validator; gzip is used if debug mode is off and the HTML5 validator
appears to be non-local.  Since IIUC both validator.w3.org and qa-dev use local
HTML5 validator instances, they end up always _not_ gzipping the response, thus
triggering the problem.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 27 October 2011 20:56:03 UTC