Re: Validator charset

Hi Frank, all.

On Mar 11, 2008, at 20:33 , Frank Ellermann wrote:
>> I think the validator does look at the xml declaration as
>> a source. See e.g the following test case:
>> http://qa-dev.w3.org/wmvs/HEAD/dev/tests/charset-xmldecl.xhtml
>
> Valid and UTF-8, do you have a similar test not using UTF-8 ?
> With a default UTF-8 it is not obvious what triggered UTF-8.
>
> My example was <http://xyzzy.webhop.info/home/ltru/4645bisU.xml>
> sending text/xml without charset resulting in US-ASCII and a
> fatal validation error for the UTF-8 XML.

Indeed, I just checked the code (look at the check script, around line  
500), and found out that it has a special case for text/(something+)xml:

elsif ($File->{ContentType} =~ m(^text/([-.a-zA-Z0-9]\+)?xml$)) {
   # Act as if $http_charset was 'us-ascii'. (MIME rules)
   $File->{Charset}->{Use} = 'us-ascii';

   &add_warning('W01', {
     W01_upload => $File->{'Is Upload'},
     W01_agent  => $File->{Server},
     W01_ct     => $File->{ContentType},
   });

}

That code may be a mistake… I don't recall being around when it was  
added, so it may be coming from a zealous interpretation of RFC 3023…

-- 
olivier

Received on Wednesday, 12 March 2008 04:54:24 UTC