- From: <bugzilla@wiggum.w3.org>
- Date: Tue, 02 Dec 2008 16:22:53 +0000
- To: www-validator-cvs@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6259 Olivier Thereaux <ot@w3.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution| |INVALID Status|NEW |RESOLVED CC| |ot@w3.org --- Comment #1 from Olivier Thereaux <ot@w3.org> 2008-12-02 16:22:52 --- (In reply to comment #0) > So, character encoding is not checked when using validating by direct HTML > input. Imho the character encoding should also be checked when validating by > direct HTML input. When you validate by URI, the validator retrieves the resource via HTTP. What it retrieves are bytes, which must be decoded into characters - hence the importance of knowing the character encoding, either thanks to the “charset=” parameter in the HTTP headers sent by the server, or with the <meta> information in HTML, or other possibles sources. Likewise, with file upload, with some minor differences (there is no web server, but the web browser pretty much plays that role). When using direct input however, what you give to the validator is not a series of bytes: you copy and paste characters into a form on the validator's home page. That validator page is encoded in utf-8, which means that, automatically, the form submitted to the validator will be in utf-8. And that, regardless of what your original content was encoded in, regardless of whatver meta information is present. Does the validator need to check encoding in "direct input" mode? No, per the above. Should it do it? There answer here too is "no". Imagine that your document is encoded in ISO-9959-1 (a.k.a latin-1). It is properly served as latin-1 on your web server, there is a <meta> tag with that information too. All is well. Now, imagine that you take that page source, copy-paste it into the "direct input" form of the validator: then, as explained above, the markup automatically becomes characters in utf-8. Should the validator complain that it is receiving utf-8 content when the source says "iso-8859-1"? Of course not, that would be wrong and confusing. In other words, direct input and by UIR validation are a very different paradigm, and the difference shows most clearly in handling encodings. It *is* confusing, and if you can think of any way to make it less confusing, ideas are welcome. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Tuesday, 2 December 2008 16:23:01 UTC