W3C home > Mailing lists > Public > www-validator@w3.org > March 2008

Re: Validator charset

From: olivier Thereaux <ot@w3.org>
Date: Wed, 12 Mar 2008 00:54:17 -0400
Cc: www-validator@w3.org
Message-Id: <91F1F8DF-3733-480D-801B-1884635016D2@w3.org>
To: "Frank Ellermann" <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>

Hi Frank, all.

On Mar 11, 2008, at 20:33 , Frank Ellermann wrote:
>> I think the validator does look at the xml declaration as
>> a source. See e.g the following test case:
>> http://qa-dev.w3.org/wmvs/HEAD/dev/tests/charset-xmldecl.xhtml
>
> Valid and UTF-8, do you have a similar test not using UTF-8 ?
> With a default UTF-8 it is not obvious what triggered UTF-8.
>
> My example was <http://xyzzy.webhop.info/home/ltru/4645bisU.xml>
> sending text/xml without charset resulting in US-ASCII and a
> fatal validation error for the UTF-8 XML.

Indeed, I just checked the code (look at the check script, around line  
500), and found out that it has a special case for text/(something+)xml:

elsif ($File->{ContentType} =~ m(^text/([-.a-zA-Z0-9]\+)?xml$)) {
   # Act as if $http_charset was 'us-ascii'. (MIME rules)
   $File->{Charset}->{Use} = 'us-ascii';

   &add_warning('W01', {
     W01_upload => $File->{'Is Upload'},
     W01_agent  => $File->{Server},
     W01_ct     => $File->{ContentType},
   });

}

That code may be a mistake… I don't recall being around when it was  
added, so it may be coming from a zealous interpretation of RFC 3023…

-- 
olivier
Received on Wednesday, 12 March 2008 04:54:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:28 GMT