W3C home > Mailing lists > Public > www-validator@w3.org > February 2007

Re: Validation of apostrophe

From: richard Eskins <R.Eskins@mmu.ac.uk>
Date: Fri, 16 Feb 2007 12:50:08 +0000
Message-Id: <45D5A880.993F.005B.3@gwmail.ncs.mmu.ac.uk>
To: <www-validator@w3.org>
Cc: "Jukka K.Korpela" <jkorpela@cs.tut.fi>, "richard Eskins" <R.Eskins@mmu.ac.uk>, "olivier Thereaux" <ot@w3.org>

Thanks folks for the replies. These answer my question and I now understand what is happening.

Yes, I can see that the Direct Input facility should have some sort of warning that it converts input to UTF-8 and  windows-1252
and ignores the declared ISO-8859-1 encoding. However, I also agree it might cause more problems than it solves.

As you saw (I will be removing the file validated), this was a student exercise in which they create a page in a lab based test.
The final step is to validate the page. I'll just have to stress next year that this must be done via Upload, not the Direct

Many thanks

Richard Eskins

Dept of Information and Communications
Manchester Metropolitan University
The Geoffrey Manton Building
Rosamond St West, Off Oxford Road

tel: +44 (0)161 247 6154 fax: +44 (0)161 247 6351

"Before acting on this email or opening any attachments you
should read the Manchester Metropolitan University's email
disclaimer available on its website
http://www.mmu.ac.uk/emaildisclaimer " 

>>> On 15/02/2007 at 00:23, in message
<1CC8E6A3-45B3-4408-A96C-A887A4CC4F03@w3.org>, olivier Thereaux <ot@w3.org>

> On Feb 15, 2007, at 06:14 , Jukka K. Korpela wrote:
>> This is a bit perplexing issue, but your conclusion seems somewhat  
>> surprising. The meta tag _is_ there, and by HTML specifications, it  
>> is to be trusted when there is no HTTP header to the contrary.
> I can't find the exact prose for it (can you find a pointer?) but  
> indeed, HTTP headers have precedence over the content of the meta  
> tag, and, broadening a little for such cases as uploading a string in  
> utf-8, I think, the spirit of the specs is that the way the document  
> is served (HTTP, Mime, etc.) has precedence in the declaration of the  
> encoding over what the document declares.
>> In practical terms, this results in a confusion: you copy and paste  
>> your document into the direct input field and get a response saying  
>> that the document is valid, though it is not.
> It is valid. Whether we like it or not, especially when it comes to  
> character encoding declarations, but also media types, validity  
> heavily depends on how the document is served.
>> We might distinguish the document as residing on disk and the  
>> document as submitted via the form, but I'm afraid less experienced  
>> web page authors will get completely lost and virtually all will be  
>> surprised if they detect this situation.
>> Shouldn't the validator at least issue a warning saying that it  
>> received the document in utf-8 encoding, even though the document  
>> declares a different character encoding? Admittedly this would be  
>> confusing too, even to people who have no problems with encodings  
>> since they never dreamt of anything outside ASCII. :-)
> Yes, that would be confusing too. I am yet to find a reasonably  
> simple way to explain that the direct input will mean a transcoding  
> into utf-8, and the consequences that brings, without scaring less  
> experienced users away. Seeing as it is rather transparent for most,  
> the current situation kind of does the job.
Received on Friday, 16 February 2007 12:50:28 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:51 UTC