W3C home > Mailing lists > Public > www-validator@w3.org > July 2009

Re: Character encoding for html 5

From: Michael(tm) Smith <mike@w3.org>
Date: Fri, 24 Jul 2009 09:43:53 +0900
To: Eric Bierman <bierman@annmanor.ca>
Cc: www-validator@w3.org
Message-ID: <20090724004350.GB31967@sideshowbarker>
Eric Bierman <bierman@annmanor.ca>, 2009-07-21 14:47 -0400:

>  Validator complains about missing character encoding definition and says it 
>  used the default utf-8. But that was what was actually specified in a new 
>  <meta charset="utf-8" /> tag.
> 
>  I understand this is an experimental html 5 checker, but this is rather 
>  basic html 5.
> 
>  Document is at http://www.annmanor.ca/news.shtml

I raised a new bug:

  http://www.w3.org/Bugs/Public/show_bug.cgi?id=7135

I attached there a patch with a proposed fix:

Index: ./httpd/cgi-bin/check
===================================================================
RCS file: /sources/public/validator/httpd/cgi-bin/check,v
retrieving revision 1.673
diff -u -r1.673 check
--- ./httpd/cgi-bin/check	30 Jun 2009 18:49:07 -0000	1.673
+++ ./httpd/cgi-bin/check	24 Jul 2009 00:37:16 -0000
@@ -534,14 +534,14 @@
   my ($override, undef) = split(/\s/, $File->{Opt}->{Charset}, 2);
   $File->{Charset}->{Override} = lc($override);
 
-  if ($File->{Opt}->{FB}->{Charset}) { # charset fallback mode
+  if ($File->{Opt}->{FB}->{Charset} and $File->{DOCTYPE} ne "HTML5") { # charset fallback mode
     unless ($File->{Charset}->{Use}) { # no charset detected, actual fallback
       &add_warning('W02', {W02_charset => $File->{Charset}->{Override}});
       $File->{Tentative} |= T_ERROR; # Tag it as Invalid.
       $File->{Charset}->{Use} = $File->{Charset}->{Override};
     }
   } else { # charset "hard override" mode
-    if (! $File->{Charset}->{Use}) { # overriding "nothing"
+    if (! $File->{Charset}->{Use} and $File->{DOCTYPE} ne "HTML5") { # overriding "nothing"
       &add_warning('W04', {W04_charset => $File->{Charset}->{Override}, W04_override => TRUE});
       $File->{Tentative} |= T_ERROR;
       $File->{Charset}->{Use} = $File->{Charset}->{Override};

I've not tested that and don't know if it's correct perl syntax,
but the idea is just to not do the encoding check at all for
doctype=HTML5 pages, because the HTML5 checker does its own
encoding check.

  --Mike

-- 
Michael(tm) Smith
http://people.w3.org/mike/
Received on Friday, 24 July 2009 00:44:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:36 GMT