Re: HTML 5 and conformance checkers

On Jun 14, 2007, at 04:27, Karl Dubost wrote:

> Comment about HTML 5, Revision 1.90  on Conformance checkers.
>
>         Conformance checkers must verify that a document
>         conforms to the applicable conformance criteria
>         described in this specification.
>
> How someone defines what is an applicable conformance criteria?
> Or the opposite, what is a not applicable conformance criteria?

The applicable conformance criteria are [machine-checkable] criteria  
for document conformance both in the spec itself and in normatively  
referenced other specs.

As I've understood it, an HTML5 conformance checker takes the HTTP  
Content-Type header and the HTTP payload as its input and does not  
check for HTTP protocol conformance. That is, conformance criteria  
for the transfer protocol in use is not part of conformance criteria  
for documents.

> If such category exists, we should define the list of of what is  
> applicable.
>
>         Conformance checkers
>         are exempt from detecting errors that require
>         interpretation of the author's intent (for example,
>         while a document is non-conforming if the content of
>         a blockquote element is not a quote, conformance
>         checkers do not have to check that blockquote
>         elements only contain quoted material).
>
> I understand the spirit of the principle but it is vague. And it  
> assume that a conformance checker is always a software with no  
> human interaction. It would mean that we should define first what  
> is a conformance checker, as a software which proceeds to automatic  
> verification of a document. It is done in the Note below. I think  
> the criterias should be first class at the top.

Yeah, the spec assumes that a "conformance checker" is a piece of  
software that runs its check to completion without prompting the user  
for judgmental input.

> "A conformance checker must check for the first two criterias.
> 1. Criteria that can be expressed in a DTD.
> 2. Criteria that cannot be expressed by a DTD, but
> can still be checked by a machine.
> 3. Criteria that can only be checked by a human."

I think it is a bad idea to formulate by mentioning "DTD", because it  
wrongly implies an implementation where non-DTD checks augment DTD- 
based validation. Using DTD-based validation as part of an HTML5  
conformance checker implementation is a particularly bad idea  
considering the level of mismatch between the conformance  
requirements for HTML5 documents and the expressiveness of DTDs.  
Also, the way DTDs are defined intertwines XML parsing with  
validation, which doesn't fit the text/html serialization.

I suggest something along the lines of:
There are conformance criteria that can be checked by a machine and  
there are conformance criteria that can only be checked by a human. A  
conformance checker must check for criteria that can be checked by a  
machine. Note that a schema-based validator is unlikely to be  
sufficient for checking for all machine-checkable criteria.

> Then there is a work to know what we consider being checkable by  
> machine or human.

If something is checkable algorithmically without a probabilistic  
heuristic (i.e. without guess about the author's intent or about the  
meaning of natural-language text), it is machine checkable. In my  
experience, at least with a computer science background, it is  
obvious whether a given conformance criterion is machine-checkable  
when reading the spec.

>          Conformance checkers must check that the input
>         document conforms when scripting is disabled, and
>         should also check that the input document conforms
>         when scripting is enabled. (This is only a "SHOULD"
>         and not a "MUST" requirement because it has been
>         proven to be impossible. [HALTINGPROBLEM])
>
> Is the intented purpose of this is to define two levels of  
> Conformance?

What would the other level of conformance be? If it involves  
executing scripts, would it be OK for conformance to the other level  
to be undecidable be machine in a general case?

A snapshot of the DOM in a browser at a user-chosen point in time  
could be checked for conformance, though. This would, again, not  
involve executing scripts during the conformance checking process.

>          The term "HTML5 validator" can be used to refer to a
>         conformance checker that itself conforms to the
>         applicable requirements of this specification.
>
> The way it is written here would mean that the piece of software  
> has to be written in HTML 5, which doesn't make sense in many cases.

No, the *applicable* conformance criteria for whether a conformance  
checker itself is a conforming conformance checker aren't the  
conformance criteria for documents.

> Suggestion: "The term HTML5 validator can be used to refer to a  
> software that meets the Conformance Checker requirements of this  
> specification."

Yeah, it is better to say what "applicable" means.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 14 June 2007 05:16:03 UTC