
Henri Sivonen wrote:

Whatever that is, apparently the test works with FF2,
and apparently it's about the same SGML comment issue
as in my res.htm and res.html HTML test cases.

> HTML5 parsing has no such thing as a valid DTD subset.

<sigh />  If it cannot parse valid XHTML 1 it's fine,
just don't offer the option, or give up with a clear
error message when you "see" a DTD subset or anything
else that won't fit into your model, valid or not.

> The error conditions follow the HTML5 parsing spec 
> without ascribing SGML meaning to syntax errors.

By definition XHTML 1 doesn't care what "HTML5" might
be, it has its own specification and syntax.  Some of
it odd, but not really worse than "HTML5" or HTML -
from my POV decisively better than any HTML cum SGML.

> The impression that I get is that the TAG and the
> HTTP WG aren't part of "everyone and his dog".

A recent proposal in the HTTP WG apparently says that
iso-8859-1 text means "unknown charset", while the
default text charset is still iso-8859-1, go figure.

Whatever this means, lots of fun while testing HTTP 
comes to mind, it has nothing to do with testing an
isolated HTML / HTML5 / XHTML document for validity.

> I might be persuaded to ignore Content-Type if you
> can get the TAG to repeal mime-respect and the IETF
> HTTP WG to endorse content sniffing

I'll try to convince the HTTP WG that any "sniffing"
is no job jor HTTP.  But this doesn't affect you or
other validators, what they should do is answer the
simple question:

 Is document X valid HTML / HTML5 / XHTML ?

For any given X, independent of how you get it, HTTP,
upload, FTP, pigeon carrier, gopher, form input, ...

>> As noted above, just ignore what HTTP servers say,
>> all you get are mad lies, resulting in hopelessly
>> confusing error messages about issues not under the
>> control of the tester.
> That's like people on www-validator complaining that
> their invalid ad serving boilerplate is not under 
> their control.

NAK, for a given X you can't say that X is actually X',
because a validator cannot know what X' was supposed
to be before inserted ads (etc.) mutilated it into X.

OTOH what you got as X, however you got it, *is* X,
the valid or invalid input for validation.  What HTTP
servers claim is at best *optional* additional info
for the task to validate X.

If folks actually want to check X'' = X + HTTP header
or X''' = X + charset or doctype overrides offer this
as option.  As you already do it for X''' but not X''.

> Making the references to a misconfigured server is
> under your control.

Yeah, I could use form input or upload instead of a
HTTP URL, or maybe set up a decent gopher server and
let your validator tackle this.    

If that is your idea of usability we are wasting time,
as I can simply use validators doing what I want, i.e.
check X, neither X' nor X'', and typically not X'''.


Received on Saturday, 16 February 2008 12:24:09 UTC