Re: validator.nu from Frank Ellermann on 2008-02-05 (public-html-comments@w3.org from February 2008)

From: Frank Ellermann <omniplex@freenet.de>
Date: Tue, 5 Feb 2008 22:28:15 +0100
To: <public-html-comments@w3.org>
Cc: "Henri Sivonen" <hsivonen@iki.fi>
Message-ID: <00ab01c8683e$05d90970$3759863e@xyzzy>
Henri Sivonen wrote:

> I suggest sending feedback to the ISO/IEC FDIS 19757 committee.

If necessary that's up to somebody knowing what "FDIS 19757" is,
I'm happy with getting the drift and limitations of the few DTDs
I care about.

> I'm not interested in delivering a tool to authors who try to
> make a point by authoring new HTML 2.0 or 3.2 content today.

Right, and I'm not very interested in HTML at all where XHTML 1
does what I want.  But you are obviously interested in reviving
a kind of HTML, and SGML comments actually work in browsers I've
tested.  Truth in advertising - what you do is not really "HTML"
or XHTML, it's a new class of its own.

> Currently, the HTML 5 draft doesn't permit presentational color-
> setting attributes at all, so the issue of permitted value space
> is moot.

ACK, the "HTML5 diff" draft mentions 'bgcolor'.  Maybe it could
also mention 'color' for (conditionally) allowed <font>-elements.

  [DTD subset]
>> that's apparently not (yet) supported by http://validator.nu
>> and FWIW also in no browser I know.

> If it isn't supported in any browser, it would be less useful
> if the validator didn't point out the problem, wouldn't it?

The WDG validator showed a warning, good.  The W3C validator
accepted <br /> as valid HTML for some time, that was ugly.
But validator.nu drops the ball for valid DTD subsets, bad.

It's not "pointing out the problem", as the WDG warning does,
it's lost with some syntactically valid pre-HTML5 constructs.

A clear error message would be "DTD subset not supported", or
if that's simply not allowed in XHTML 1 (dunno) say "invalid".

> You can manually override the parser to XML with external
> entity resolution if you wish to check XML documents that
> aren't suitable for use with Web browsers:
> http://validator.nu/?doc=http%3A%2F%2Fpurl.net%2Fxyzzy%2Fibm850.htm&parser=xmldtd&laxtype=yes

I tried that for my 4th test, not for the 3rd, but with your
link I run again in the reported "chemical" oddity, after two
warnings (lax + encoding) the third (and last) is the fatal
"IO Error: Non-XML Content-Type: chemical/x-pdb" (see below).

What about the second warning:

| External encoding information specified a non-UTF-8/non-UTF-16
| encoding (ISO-8859-1), but there was no matching internal
| encoding declaration.

PURL redirects to www.xyzzy.claranet, unfortunately that now
redirects again to home.claranet.de/xyzzy, and there I get an
(erroneous) Content-Type: text/html; charset=ISO-8859-1

Everybody and his dog knows that many authors cannot fix weird
ideas of HTTP servers.  IMO (X)HTML validators should completely
ignore the HTTP layer and focus on the job at hand, report only
issues *within* a document hoping to be valid (X)HTML.  Offering
HTTP consistency checks as *option* (opt-in) is of course fine.  

That's of course no special validator.nu issue, other validators
also mix different layers (transport and content) into a rather
confusing (for ordinary users) mess.  

> http://hixie.ch/advocacy/xhtml

It won't surprise anybody that I disregard this text, otherwise I
would not use XHTML 1.  The constructs to escape CSS and JS in a
way working for both XML and SGML are admittedly odd, but as long
as I don't use inline <script> or <style> I only need to know how
to fix it elsewhere.  Quoting the text:

| they are just going to be treating it the same was as plain old
| HTML 3.2

Yes, it has to "work" with Netscape 3.x.  Without SGML cruft, all
tags explicit, proper nesting, no nonsense.  For definitions of
"work" not including mere decorations like colours and fonts, but
still more than the "strict" subset permits.

> text/html is not an RFC 3023-compliant XML media type.

After validator.nu couldn't handle the tested document "as is" I 
tried various combinations of the offered options, if one of them
means "XML as per RFC 3023" this wasn't what I really wanted.

> HEAD /sgml-lib/REC-xhtml1-20020801/xhtml-lat1.ent HTTP/1.1
[...]
> Host: validator.w3.org
[...]
> Content-Type: chemical/x-pdb

Brilliant, three involved servers (claranet, Google, W3C), and
none of them gets the relevant content types right.  It never
occured to me that validator.w3.org could be a problem, I only
checked claranet and googlepages... :-(

As noted above, just ignore what HTTP servers say, all you get
are mad lies, resulting in hopelessly confusing error messages
about issues not under the control of the tester.  

 [title attribute of <link>]
> I can't reproduce this problem.

Nor me, I guess when I gave up on the IRI test and composed the
report going back to the SGML-comment and rev="made" problem I
misintrepreted the rev="made" error message:  It doesn't claim
that 'title' is required, it says that "something" is required.

Or I confused it with the xml:lang errors, all my pages have
xml:lang alongside 'lang' everywhere, as required.  Reporting
this as error is wrong for XHTML 1.

 Frank
Received on Tuesday, 5 February 2008 21:27:28 UTC