Re: Meta Character Encoding Not Detected from Email Reply on 2007-11-21 (www-validator@w3.org from November 2007)

From: Email Reply <email_reply0234@mercysoftware.com>
Date: Tue, 20 Nov 2007 23:39:56 -0500
To: olivier Thereaux <ot@w3.org>
Cc: www-validator@w3.org
Message-Id: <1195619996.5178.74.camel@localhost.localdomain>

On Wed, 2007-11-21 at 10:56 +0900, olivier Thereaux wrote:
> Dear "Email Reply"
> 
> On 21 nov. 07, at 04:35, Email Reply wrote:
> 
> > If I validate by Direct Input and paste the following code into the  
> > validator:
> >
> > If I set encoding to "detect automatically" then the validator  
> > selects utf-8 as the encoding despite the fact that I have set a  
> > meta tag declaring it as iso-8859-1.
> 
> This is specific to "direct input". When the validator fetches a  
> document online or gets it sent by file upload, there is a question of  
> what the file, or HTTP resource, is encoded in. In direct input  
> however, what gets sent to the validator is not a file, but a string  
> of characters encoded in the same encoding as the validator's  
> interface, that is, utf-8.
> 
> Even if the document you will eventually publish is not utf-8, the act  
> of copy-pasting it to the text area in the validator will make it utf-8.
> 
> As a result, the meta charset information in the markup is ignored.
> 
> > If I set the encoding to iso-8859-1, then the validator issues a  
> > warning that I'm overriding the detected character encoding of utf-8.
> 
> I am puzzled by this. For the reasons explained above, the "direct  
> input" interface does not have any character encoding override  
> mechanism. Where did you see that? What validator are you using?
> 
In response to how I changed the character encoding which I mentioned
above:

If you go to the "direct Input" and input the code which I listed in the
original message and validate it, then the program will respond with
"This Page is Valid HTML 4.01 Transistional!" at the top but in the
table below next to Encoding it will show that the validation was
performed using an encoding of utf-8.  Since, that was not what I
desired, I then at this point selected the drop-down next to the
encoding to use iso-8859-1 and selected re-validate and that is when the
program issues a warning that says:

 "Character Encoding Override in effect!  The detected character
encoding 'utf-8' has been suppressed and 'iso-8859-1' used instead".  

Well, that works if I ignore the warning; but it appears to me that
whenever a file is uploaded or direct input is used then the validator
should not detect any character encoding and should respond as if the
server had not sent any type of encoding information.  It appears to me
that the validator should use the encoding of the program instead of
responding as if the server sent utf-8.

Just a side note, that the actual code which I use for validation cannot
be obtained by using a url as it contains security which validates
against an ip address and a session key in the post.  Because of this,
if I use the url to validate, it always displays the error message
generated by our script which is what it should.  Therefore, I have to
run the code on our site and then cut the code and paste it in order to
check the validation.

Received on Wednesday, 21 November 2007 04:57:22 UTC