Re: HTML 5 and conformance checkers from Henri Sivonen on 2007-06-14 (public-html@w3.org from June 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 14 Jun 2007 10:40:49 +0300
To: Karl Dubost <karl@w3.org>
Cc: HTML WG <public-html@w3.org>
Message-Id: <8C21D215-2364-4F84-9905-5DDF91CFC2AD@iki.fi>
On Jun 14, 2007, at 09:22, Karl Dubost wrote:

> Le 14 juin 2007 à 14:16, Henri Sivonen a écrit :
>> The applicable conformance criteria are [machine-checkable]  
>> criteria for document conformance both in the spec itself and in  
>> normatively referenced other specs.
>
> Yes on the same line here. My trouble is what is [machine-checkable].
> I see possible discussion on people argueing on what is  
> automatically checkable or not. It is why I said an objective list  
> of criterias to include or exclude would make it easier.

Easier for whom? Not for the spec editors, that's sure. As a  
conformance checker implementor, I have so far found determining the  
machine-checkability of requirements sufficiently easy.

> What about
>
>     "An HTML 5 conformance checker must implement all the machine
>     checkable criteria of this specification.
>
>      Note: There are criteria that can only be checked by a human
>     and then do not affect an HTML5 conformance checker. Some of
>     the machine criteria can't be expressed by the current schema
>     languages. You should not rely only on schema languages to
>     create an HTML5 conformance checker."

Looks ok.

> Let's try with a concrete example: q element for quotes.
>
>     The q element represents a part of a paragraph quoted
>     from another source.
>
> Does that mean q is contained in a paragraph (address, aside, navm  
> footer, li, dd, figure, and p.)?
> http://dev.w3.org/cvsweb/~checkout~/html5/spec/Overview.html#paragraph

That's depends on whether the q is a descendant of at address, aside,  
navm footer, li, dd, figure, or p.

> Does that mean that the content of q element is a part of a paragraph?

Yes.

> If it's the former, it means div > q fails.

div > p depends on the document tree regardless of conformance.

More to the point, any definitional problem here arises from <div>-- 
not <q>.

>      If the cite attribute is present, it must be a URI
>     (or IRI).
>
> checkable. It means that the HTML5 conformance has to check that is  
> it a valid URI or IRI
>     http://www.ietf.org/rfc/rfc3986.txt
>     http://www.ietf.org/rfc/rfc3987.txt

Yes.

As a side note: For extra usefulness, a checker can have knowledge  
about particular URI scheme-specific requirements. Different choices  
here cause a theoretical problem. If we want to remove the  
theoretical problem, the spec could enumerate a closed list of URI  
schemes that conformance checkers must know about. (Forbidding the  
application of knowledge about common schemes like http, https and  
mailto would be silly.)

> Here it is tricky. the association of
>
>     <p><q cite="urn:isbn:2-07010-579-2">Plus vague et
>        plus soluble dans l'air,</q>
>        est un vers de l'<cite>Art Poétique,
>        Œuvres poétiques complètes, Paul
>        Verlaine.</cite></p>
>
> Here a tool can extract
>     "Plus vague et soluble dans l'air,"
>     Art poétique, Œuvres poétiques complètes, Paul Verlaine
>     urn:isbn:2-07010-579-2
>
> I can have a process which uses only machines and tries to match  
> the isbn and the title and/or the author. It will be only done by  
> machine. Using for example services like http:// 
> worldcatlibraries.org/wcpa/isbn/2-07010-579-2. Even easier in the  
> case of HTTP URIs. All of that can be done by machine only.

A machine-checkable criterion should probably be defined to be a  
criterion the conformance to which is a decidable problem (in the  
computer science sense) given a document (Content-Type and finite  
byte stream) and the knowledge embodied in the spec and the normative  
references.

That is, the program computing whether a given document conforms to a  
criterion should not be required to consult outside resources and  
should not embody arbitrary knowledge that isn't part of the spec  
(with normative references).

>>>          Conformance checkers must check that the input
>>>         document conforms when scripting is disabled, and
>>>         should also check that the input document conforms
>>>         when scripting is enabled. (This is only a "SHOULD"
>>>         and not a "MUST" requirement because it has been
>>>         proven to be impossible. [HALTINGPROBLEM])
>>>
>>> Is the intented purpose of this is to define two levels of  
>>> Conformance?
>>
>> What would the other level of conformance be? If it involves  
>> executing scripts, would it be OK for conformance to the other  
>> level to be undecidable be machine in a general case?
>
> # must and should
> * Conformance checkers must check all the must and should.
> * Conformance checkers must check all the must only.

IIRC, there were one or two SHOULDs applying to do documents that  
could use guidance on what conformance checkers are to do. I can't  
remember what those were, but I'm pretty sure I have notified Hixie.

> If the script is not executed and because it is a should,

That's a SHOULD applying to the checker itself--not to the document.

> the Conformance checker silently ignores it and says conforms?

Yes.

> Then *another* conformance checker (more performant) had success  
> running scripts and sees that the document does not conform, what  
> does it say?

It has been proven (in 1937) that such a conformance checker cannot  
exist for the general case.

A checker that runs scripts with some implementation-specific  
limitations but that doesn't cover the general case could be useful  
though.

> Once the document is conformant and once it is not.

Yes. With scripts, document conformance is possible to check only for  
a given snapshot in time. Obviously, the result can be different with  
different snapshots.

It is impossible to check (in the general case) that a document with  
a script executing on it keeps conforming from t_0 to t_infinity.

>> A snapshot of the DOM in a browser at a user-chosen point in time  
>> could be checked for conformance, though. This would, again, not  
>> involve executing scripts during the conformance checking process.
>
> user-chosen point in time? No human interaction, we said above.

Well, someone or something has to choose the input document the  
conformance checking process. That is, a user invokes a conformance  
checker: "check the document (bytes) at this URI" or "check the  
document (DOM) currently loaded in this browser window".

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Thursday, 14 June 2007 07:39:52 UTC