Re: validator.nu

Disclaimer: This is not an official WG response. I am, however, the  
developer of Validator.nu.

On Feb 1, 2008, at 15:57, Frank Ellermann wrote:

> Henri Sivonen wrote:
>
>> I believe the list of encodings that are needed for existing
>> content is pretty close to the contents of the encoding menu
>> at http://validator.nu/
>
> BTW, my usual "validator torture tests" strongly indicate that
> http://validator.nu is unrelated to the concepts of "validator"
> and "existing content".

 From the about page:
| No DTD-Based Validation
|
| *  Validator.nu does not check for XML 1.0 validity constraints.  
That is, DTD
|    validation is not performed. “Validation” and “validator” in the  
name and
|    the user interface of the service refer to the ISO/IEC FDIS 19757-2
|    definition of “validator” (which performs validation), to the  
Schematron
|    “validation” function (which is performed by a validator), and to  
the
|    HTML 5 definition of “validator”.
|
| *  Validator.nu does not perform the duties of a “validating SGML  
parser” as
|    defined in ISO 8879. In fact, this service does not have any SGML
|    functionality at all. In particular, the HTML 4.01 support uses  
the HTML5
|    parser with some additional error conditions.
http://about.validator.nu/

If you don't like the terminology of RELAX NG and Schematron  
"validation" being performed by a "validator", I suggest sending  
feedback to the ISO/IEC FDIS 19757 committee.

As far as HTML 5 validation goes, experience showed that calling it  
validation rather than conformance checking made communicating with  
people easier.

Validator.nu is a tool for authors. It isn't designed as a tool for  
checking someone else's existing HTML 2.0 or 3.2 content as HTML 2.0  
or 3.2. I'm not interested in delivering a tool to authors who try to  
make a point by authoring new HTML 2.0 or 3.2 content today.  
Supporting HTML 2.0 or 3.2 would not be cost-effective.

> 1 - http://purl.net/xyzzy/home/test/res.htm and res.html:
>    Quirky or not, HTML 2 strict and HTML i18n allowed those
>    odd SGML comments.  AFAIK nothing is wrong with <tt> in <p>.

Validator.nu does not support HTML 2.0 and doesn't claim to. However,  
it checks the content as HTML5 for your convenience in case you are an  
author seeking to upgrade an existing site template to HTML5.

> 2 - http://purl.net/xyzzy/colour.htm intentionally uses "known"
>    colour names, I fear they are quite popular in "existing
>    content", maybe HTML5 should accept them as "legacy".
>    The validator found another issue I wasn't aware of, nice.

Note that by default, Validator.nu tries to use the XHTML 1.0  
Transitional schema with that page. That spec defers to HTML 4.01  
which only allowed 16 colors:
http://www.w3.org/TR/html401/types.html#h-6.5

Currently, the HTML 5 draft doesn't permit presentational color- 
setting attributes at all, so the issue of permitted value space is  
moot.

> 3 - http://purl.net/xyzzy/ibm850.htm has a DTD subset with some
>    entity declarations, that's apparently not (yet) supported
>    by http://validator.nu and FWIW also in no browser I know.

If it isn't supported in any browser, it would be less useful if the  
validator didn't point out the problem, wouldn't it?

You can manually override the parser to XML with external entity  
resolution if you wish to check XML documents that aren't suitable for  
use with Web browsers:
http://validator.nu/?doc=http%3A%2F%2Fpurl.net%2Fxyzzy%2Fibm850.htm&parser=xmldtd&laxtype=yes

> 4 - http://hmdmhdfmhdjmzdtjmzdtzktdkztdjz.googlepages.com/IDN-IRI-test.html
>    Unusable output for XHTML 1 sent as text/html for all pages,

http://hixie.ch/advocacy/xhtml

>    if a validator cannot validate XHTML 1 it shouldn't try to
>    do it anyway.

Previously, Validator.nu simply halted in that case. You are the first  
person to suggest that the current behaviour weren't more useful. I  
think I'm keeping the current behavior.

> "Preset XHTML 1" doesn't help to get the
>    corresponding parser.

That's because the XHTML 1.0 schemas are also used for HTML 4.01.

> The XML parser refuses to validate text/html.

text/html is not an RFC 3023-compliant XML media type.

>  Third attempt, UTF-8 + XHTML 1 + XML + "lax"
>    (whatever that means),

Lax means disrespecting RFC 3023 for the purpose of text/xml encoding  
default and disrespecting the meaning of text/plain and text/html.

> and now the validator states that it
>    doesn't know Content-Type: chemical/x-pdb.
>    Neither do I, it's not mentioned in the document or the DTD.

You include an external entity from elsewhere.

$ telnet validator.w3.org 80
Trying 128.30.52.49...
Connected to validator.w3.org.
Escape character is '^]'.
HEAD /sgml-lib/REC-xhtml1-20020801/xhtml-lat1.ent HTTP/1.1
Accept: */*; q=0.1, application/docbook+xml, application/xhtml+xml,  
application/xml; q=0.5, image/svg+xml, text/xml; q=0.3
Host: validator.w3.org
Connection: close

HTTP/1.1 200 OK
Date: Sat, 02 Feb 2008 12:38:54 GMT
Server: Apache/2.2.6 (Debian)
Last-Modified: Tue, 20 Aug 2002 01:51:30 GMT
ETag: "40c881-2dff-3a89aed4fec80"
Accept-Ranges: bytes
Content-Length: 11775
Connection: close
Content-Type: chemical/x-pdb

Connection closed by foreign host.

>    Admittedly Google sends the DTD as application/octet-stream
>    instead of application/xml-dtd, but that's not "chemical".

Yeah, the Google server is not the misconfigured one.

> 5 - All link rev="made" are reported as errors.

The current HTML5 draft obsoletes rev, because rev is rare but when it  
is used, it is most often used wrong.

rev='made' is the exception, but rel='author' is the permitted way of  
communicating the same thing.

> 6 - Link elements without title are reported as errors even for
>    "existing content" where that's not required, and arguably
>    pointless for some relations including "made" or "author".

I can't reproduce this problem. Do you have a link to a test case  
demonstrating this?

> 7 - Validator.nu test aborted before the end of my test suite.

Thank you for your feedback.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Saturday, 2 February 2008 12:45:35 UTC