- From: Takuya ASADA <asada@w3.mag.keio.ac.jp>
- Date: Tue, 5 Sep 2000 05:43:56 -0400 (EDT)
- To: www-validator@w3.org
From: Masayasu Ishikawa <mimasa@w3.org> > Eric Maryniak <e.maryniak@pobox.com> wrote: > > > note the encoding ("UTF-8"). However, the validator still says: > > > > Character encoding: unknown > > > > Is this correct? > > Although it would be better to recognize the encoding declaration, > the "correct" way to specify the character encoding is to use > the charset parameter of the "Content-Type" HTTP response header. I made a patch to recgnize encoding from XML declaration. I hope this change will be accepted by W3C's original validator. Takuya ASADA @ W3C/Keio -- *** check.org Sat Jul 1 05:33:50 2000 --- check Tue Sep 5 18:29:44 2000 *************** *** 269,274 **** --- 269,283 ---- } # + # If we find a XML declaration with charset information, we take it into account. + $line = shift(@{$File->{Content}}); + if ($line =~ /<\?xml\s/) { + if ($line =~ /encoding\s*=[\s\"]*([^\s;\">]*)/) { + $File->{XML_Charset} = lc $1; + } + } + + # # If we find a META element with charset information, we take it into account. foreach my $line (@{$File->{Content}}) { # @@ needs to handle meta elements that span more than one line *************** *** 284,289 **** --- 293,300 ---- # Figure out which charset to use for the validation. if ($File->{HTTP_Charset}) { $File->{Charset} = $File->{HTTP_Charset}; + } elsif ($File->{XML_Charset}) { + $File->{Charset} = $File->{XML_Charset}; } elsif ($File->{META_Charset}) { $File->{Charset} = $File->{META_Charset}; } else { *************** *** 433,438 **** --- 444,459 ---- <em><span class="warning">The character encoding specified in the HTTP header ("<code>$File->{HTTP_Charset}</code>") is different from the one specified in the META element ("<code>$File->{META_Charset}</code>"). + I will use "<code>$File->{Charset}</code>" for this validation.</span></em> + EOHD + } elsif ($File->{HTTP_Charset} ne $File->{XML_Charset} + and $File->{HTTP_Charset} ne '' + and $File->{XML_Charset} ne '' + and $File->{Charset} ne 'unknown') { + print <<"EOHD"; + <em><span class="warning">The character encoding specified in the HTTP + header ("<code>$File->{HTTP_Charset}</code>") is different from the one + specified in the XML declaration ("<code>$File->{XML_Charset}</code>"). I will use "<code>$File->{Charset}</code>" for this validation.</span></em> EOHD }
Received on Tuesday, 5 September 2000 06:03:21 UTC