- From: Takuya ASADA <asada@w3.mag.keio.ac.jp>
- Date: Tue, 5 Sep 2000 05:43:56 -0400 (EDT)
- To: www-validator@w3.org
From: Masayasu Ishikawa <mimasa@w3.org>
> Eric Maryniak <e.maryniak@pobox.com> wrote:
>
> > note the encoding ("UTF-8"). However, the validator still says:
> >
> > Character encoding: unknown
> >
> > Is this correct?
>
> Although it would be better to recognize the encoding declaration,
> the "correct" way to specify the character encoding is to use
> the charset parameter of the "Content-Type" HTTP response header.
I made a patch to recgnize encoding from XML declaration. I hope
this change will be accepted by W3C's original validator.
Takuya ASADA @ W3C/Keio
--
*** check.org Sat Jul 1 05:33:50 2000
--- check Tue Sep 5 18:29:44 2000
***************
*** 269,274 ****
--- 269,283 ----
}
#
+ # If we find a XML declaration with charset information, we take it into account.
+ $line = shift(@{$File->{Content}});
+ if ($line =~ /<\?xml\s/) {
+ if ($line =~ /encoding\s*=[\s\"]*([^\s;\">]*)/) {
+ $File->{XML_Charset} = lc $1;
+ }
+ }
+
+ #
# If we find a META element with charset information, we take it into account.
foreach my $line (@{$File->{Content}}) {
# @@ needs to handle meta elements that span more than one line
***************
*** 284,289 ****
--- 293,300 ----
# Figure out which charset to use for the validation.
if ($File->{HTTP_Charset}) {
$File->{Charset} = $File->{HTTP_Charset};
+ } elsif ($File->{XML_Charset}) {
+ $File->{Charset} = $File->{XML_Charset};
} elsif ($File->{META_Charset}) {
$File->{Charset} = $File->{META_Charset};
} else {
***************
*** 433,438 ****
--- 444,459 ----
<em><span class="warning">The character encoding specified in the HTTP
header ("<code>$File->{HTTP_Charset}</code>") is different from the one
specified in the META element ("<code>$File->{META_Charset}</code>").
+ I will use "<code>$File->{Charset}</code>" for this validation.</span></em>
+ EOHD
+ } elsif ($File->{HTTP_Charset} ne $File->{XML_Charset}
+ and $File->{HTTP_Charset} ne ''
+ and $File->{XML_Charset} ne ''
+ and $File->{Charset} ne 'unknown') {
+ print <<"EOHD";
+ <em><span class="warning">The character encoding specified in the HTTP
+ header ("<code>$File->{HTTP_Charset}</code>") is different from the one
+ specified in the XML declaration ("<code>$File->{XML_Charset}</code>").
I will use "<code>$File->{Charset}</code>" for this validation.</span></em>
EOHD
}
Received on Tuesday, 5 September 2000 06:03:21 UTC