Unresolvable error messages? from H. Hahn on 2007-04-23 (www-validator@w3.org from April 2007)

From: H. Hahn <h.hahn@hahn-informatica.nl>
Date: Mon, 23 Apr 2007 15:00:51 +0000
To: www-validator@w3.org
Message-Id: <2307F688-0BB8-462A-85F3-75780FD72D8A@hahn-informatica.nl>
[re-forwarding to the list because the attached PDF was too big --  
olivier, list maintainer]

Validating http://www.cheider.nl/cheider2007/index.php

L.S.,

While trying to validate the abovementioned website, I managed to get  
rid of most of the errors and warnings reported. However, three  
errors seem to be unresolvable (see attached PDF file for a listing  
of both the errors and the source from the website's home page).

The first error is about a disallowed character string. Actually it  
is the <meta>-tag defining the character set. This tag is  
automatically insered by the content management system (TYPO3 version  
4.0.5). (I need to use UTF-8, as the site may in the (near) future  
contain some Hebrew text.)
The other two errors complain that <head> and <body> are not open at  
the point where </head> and </body> respectively are found; while  
both are not true. (Or do these last two errors possibly result from  
an insufficient "error recovery" after the first error?)
Also, I copied & pasted the entire View Source from Firefox into the  
validator's "direct input" box, so that I could remove the "/" from  
the charset meta-tag before validating, but this did not change  
anything.

QUESTION 1:
Is there a way to force the validator to overlook certain thnigs  
(e.g. like lint programmes can suppress dertain types of errors or  
error groups)? (I am aware that this is contrary to the entire idea  
of validation, but it may help in the development process.)

QUESTION 2:
A statement like
      Text = Text.replace (/</g, "[");
produces an error message "End tag for element <g> which is not  
open". This statement only replaces "<" in Text by "[" (for reasons  
that are beyond the scope of this e-mail). The second slash is just  
the closing "quote" of the regular expression; it does not close any  
tag (let alone a non-existing <g>-tag!)
(Note: It seems that this erroneous error message mainly occurs in MS  
IE 6, and much less in Firefox 2.0! The Tidy-based Firefox plugin  
validator (see below) reports it as a warning only, not an error.)

QUESTION 3:
Somewhere I seem to have read that the W3C validator is based on the  
Tidy programme (or the other way around?). However, I am also using a  
Firefox plugin validator (HTML Validator version 0.7.9.5), which is  
also based on Tidy, but this one gives totally different results.  
E.g. it complains about "/>" after closing tags. I inserted  
backslashes everywhere to "silence" this.
But it remains reporting warning messages on "/>" in certain regular  
expressions and even in comments(!).
Is there a workaround for this "/>" problem wirth regular  
expressions? (I seem to understand that regular expression objects  
use normal double quotes instead of slashes, but I did not manage to  
find the correct syntax for applying e.g. "replace()" method of such  
an object.)

QUESTION 4:
The Tidy-based Firefox plugin validator shows an icon in the bottom- 
right corner of the sceen, showing either a green OK symbol, or a  
yellow warning symbol, or, on some pages, a blue "A". This last one  
is explained as meaning "The HTML contains invalid characters and can  
not be converted to Unicode". However, it gives no information at all  
as to what character(s) is/are being referred to and where it/they  
occur(s).
How can I find these characters?
(Note: I may have used some ANSI characters in comment (such as  
Umlaut letters), that are rendered as "unknown" because the site uses  
UTF-8. Sometimes I use numeric entities ( "&#nnnn;") for such  
characters in text strings, which are being converted to Unicode by a  
Javascript function prior to being displayed. See also question 5  
below.)
(Unfortunately, this problem only occurs at some (but not all) pages  
that happen to be password-protected. So I cannot show you an example.)

QUESTION 5:
As I see it, a validator's parser should do the same as (only much  
more strictly than) a browser's parser, i.e. it should "integrate"  
everything revelant (like styles, strings, etc.) and of course skip  
all comments (i,e, BOTH HTML comments and scripting comments).  
Sctipts should be run as far as they seem relevant for the HTML being  
generated (although this may nit always be possible).
Is it possible to force the W3C validator and/or Tidy to do so?

Thank you very much in advance for your clarifications.

Sincerely,
Hahn Informatica
Ir. H. Hahn
Braak 48
NL-5501DK Veldhoven
Nederland / Niederlande / Netherlands
Tel. +31 40 2300161
Fax +31 40 2300163
E-mail: h.hahn@hahn-informatica.nl
Internet: http://www.hahn-informatica.nl
BTW / MWSt / VAT: NL 092 081 046 B01
HR / HR / RC: Eindhoven (NL), 170 62224
Received on Wednesday, 25 April 2007 01:25:21 UTC