W3C home > Mailing lists > Public > www-validator@w3.org > February 2007

Re: Validation of apostrophe

From: olivier Thereaux <ot@w3.org>
Date: Thu, 15 Feb 2007 09:23:06 +0900
Message-Id: <1CC8E6A3-45B3-4408-A96C-A887A4CC4F03@w3.org>
Cc: www-validator@w3.org, richard Eskins <R.Eskins@mmu.ac.uk>
To: Jukka K.Korpela <jkorpela@cs.tut.fi>

On Feb 15, 2007, at 06:14 , Jukka K. Korpela wrote:
> This is a bit perplexing issue, but your conclusion seems somewhat  
> surprising. The meta tag _is_ there, and by HTML specifications, it  
> is to be trusted when there is no HTTP header to the contrary.

I can't find the exact prose for it (can you find a pointer?) but  
indeed, HTTP headers have precedence over the content of the meta  
tag, and, broadening a little for such cases as uploading a string in  
utf-8, I think, the spirit of the specs is that the way the document  
is served (HTTP, Mime, etc.) has precedence in the declaration of the  
encoding over what the document declares.

> In practical terms, this results in a confusion: you copy and paste  
> your document into the direct input field and get a response saying  
> that the document is valid, though it is not.

It is valid. Whether we like it or not, especially when it comes to  
character encoding declarations, but also media types, validity  
heavily depends on how the document is served.

> We might distinguish the document as residing on disk and the  
> document as submitted via the form, but I'm afraid less experienced  
> web page authors will get completely lost and virtually all will be  
> surprised if they detect this situation.
> Shouldn't the validator at least issue a warning saying that it  
> received the document in utf-8 encoding, even though the document  
> declares a different character encoding? Admittedly this would be  
> confusing too, even to people who have no problems with encodings  
> since they never dreamt of anything outside ASCII. :-)

Yes, that would be confusing too. I am yet to find a reasonably  
simple way to explain that the direct input will mean a transcoding  
into utf-8, and the consequences that brings, without scaring less  
experienced users away. Seeing as it is rather transparent for most,  
the current situation kind of does the job.

Received on Thursday, 15 February 2007 00:23:16 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:59:00 UTC