[Bug 10174] Bogus error reported for UTF-8 characters in larger documents from bugzilla@jessica.w3.org on 2011-10-29 (www-validator-cvs@w3.org from October 2011)

From: <bugzilla@jessica.w3.org>
Date: Sat, 29 Oct 2011 18:27:31 +0000
To: www-validator-cvs@w3.org
Message-Id: <E1RKDcx-0006Ek-PH@jessica.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=10174

--- Comment #7 from Michael[tm] Smith <mike@w3.org> 2011-10-29 18:27:30 UTC ---
(In reply to comment #4)
> -------------
> $ curl --data-binary @utf-8-validation.html -H "Content-Type: text/html"
> "http://localhost:8888/?out=gnu"

I've tried that in my local environment and can reproduce the problem 100% of
the time when I do. So the cause is not something specific to the W3C validator
server environment. I don't know why it's not reproducible with
http://validator.nu.

That said, I cannot reproduce the problem if I run the same curl command with
the --data switch instead of the --data-binary switch; that is:

curl --data @utf-8-validation.html -H "Content-Type: text/html"
"http://localhost:8888/?out=gnu"

The bug does not ever happen if I run curl that way instead.

And in reading the curl docs, I'm not sure why the --data-binary switch would
be used in this case. The curl man page says, "To post data purely binary, you
should instead use the --data-binary option.". So, I can see why that switch
would be needed for the gzipped case, since that's binary data. But the case of
the non-gzipped file, the data is not binary -- it's text.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Saturday, 29 October 2011 18:27:34 UTC