Re: Fallback to UTF-8

Hi Frank,

Thanks a lot for going through your test cases. Much appreciated. I  
failed to precise that the feedback requested was mostly on documents  
without declared character encoding, but you found other interesting  

Let's see inline.

On 28-Apr-08, at 3:53 PM, Frank Ellermann wrote:
>> | Software error:
> | Undefined subroutine  
> &W3C::Validator::EventHandler::abort_if_error_flagged | called at / 
> home/link/web/HEAD/httpd/cgi-bin/check line 2756.
> < 
> >
> < 
> >
> (The two test cases for bugs 5279 and 5280)

Fixed. As in, the Undefined subroutine error is fixed. 5279 and 5280  
are untouched.

> <>
> (Another "line 2756", it used to work, valid HTML 2.0 Strict)

OK now.

> <>
> (Another "line 2756", it used to work, well formed XML)

OK now.

> <>
> (Another "line 2756", it used to work, well formed XML)

OK now.

> [warning] Unable to Determine Parse Mode!
> [...]
> | Type (-//IETF//DTD HTML i18n//EN) is not in the validator's catalog
> <>
> (SGML is correct - RFC 2070 DTD republished by IANA)

It does validate right now. I guess you are pointing out that “ Unable  
to Determine Parse Mode” could use a better wording in this case?

> [warning] Missing "charset" attribute for "text/xml" document.
> <>
> (this text/xml document really uses encoding US-ASCII)

Content-Type: text/xml
So I guess the warning by the validator, that the spec specifies a  
strong default of "us-ascii" is OK here?

> [warning] Mismatch between Public and System identifiers
> <>
> (the released validator has no problem with using System
> identifiers pointing to its own catatlog, maybe it's an
> artefact of the != setup)

That's a new feature. Some recent feedback prompted the addition of a  
check for consistency between FPI and SI. It's a warning, so as usual,  
it can be ignored if you are sure of your doctype.

> [warning] Character Encoding mismatch!
> | The character encoding specified in the HTTP header
> | (iso-8859-1) is different from the value in the <meta>
> | element (windows-1252). I will use the value from the
> | HTTP header (iso-8859-1) for this validation.
> <>
> (Nikita consistently hates u+0080 based on an iso-8859-1
> assumption, and the document uses a windows-1252 0x80 €)

Mm, sorry, not sure if you are reporting an issue or a "work as it  
should, here". Can you give more details?

Thanks a lot.


Received on Monday, 28 April 2008 07:07:29 UTC