Re: HTML 5 and conformance checkers from Karl Dubost on 2007-06-14 (public-html@w3.org from June 2007)

From: Karl Dubost <karl@w3.org>
Date: Thu, 14 Jun 2007 15:22:39 +0900
To: Henri Sivonen <hsivonen@iki.fi>
Cc: HTML WG <public-html@w3.org>
Message-Id: <ED89EF45-8006-47A9-AE71-4870DAC8FA57@w3.org>
Le 14 juin 2007 à 14:16, Henri Sivonen a écrit :
> The applicable conformance criteria are [machine-checkable]  
> criteria for document conformance both in the spec itself and in  
> normatively referenced other specs.

Yes on the same line here. My trouble is what is [machine-checkable].
I see possible discussion on people argueing on what is automatically  
checkable or not. It is why I said an objective list of criterias to  
include or exclude would make it easier.


>> "A conformance checker must check for the first two criterias.
>> 1. Criteria that can be expressed in a DTD.
>> 2. Criteria that cannot be expressed by a DTD, but
>> can still be checked by a machine.
>> 3. Criteria that can only be checked by a human."
>
> I think it is a bad idea to formulate by mentioning "DTD", because  
> it wrongly implies an implementation where non-DTD checks augment  
> DTD-based validation.

Good point. DTD/XML Schema/RelaxNG/Whatever schema language. It will  
make the conformance criteria easier to read. What about

     "An HTML 5 conformance checker must implement all the machine
     checkable criteria of this specification.

      Note: There are criteria that can only be checked by a human
     and then do not affect an HTML5 conformance checker. Some of
     the machine criteria can't be expressed by the current schema
     languages. You should not rely only on schema languages to
     create an HTML5 conformance checker."

>> Then there is a work to know what we consider being checkable by  
>> machine or human.
>
> If something is checkable algorithmically without a probabilistic  
> heuristic (i.e. without guess about the author's intent or about  
> the meaning of natural-language text), it is machine checkable. In  
> my experience, at least with a computer science background, it is  
> obvious whether a given conformance criterion is machine-checkable  
> when reading the spec.

Let's try with a concrete example: q element for quotes.

     The q element represents a part of a paragraph quoted
     from another source.

Does that mean q is contained in a paragraph (address, aside, navm  
footer, li, dd, figure, and p.)?
http://dev.w3.org/cvsweb/~checkout~/html5/spec/Overview.html#paragraph
Does that mean that the content of q element is a part of a paragraph?

If it's the former, it means div > q fails.
If it's the latter, we can't check that it is indeed a paragraph from  
another source. Except on closed system. - Let's say human criteria.


     Content inside a q element must be quoted from
     another source, whose URI, if it has one, should be
     cited in the cite attribute.

not verifiable. human criteria, except closed system, but might be a  
repetition of the first sentence. depending on how we interpret it.

      If the cite attribute is present, it must be a URI
     (or IRI).

checkable. It means that the HTML5 conformance has to check that is  
it a valid URI or IRI
     http://www.ietf.org/rfc/rfc3986.txt
     http://www.ietf.org/rfc/rfc3987.txt

      User agents should allow users to follow
     such citation links.

N/A. Conformance checkers are not user agents.

      If a q element is contained (directly or indirectly)
     in a paragraph that contains a single cite element
     and has no other q element descendants, then, the
     citation given by that cite element gives the source
     of the quotation contained in the q element.

Here it is tricky. the association of

     <p><q cite="urn:isbn:2-07010-579-2">Plus vague et
        plus soluble dans l'air,</q>
        est un vers de l'<cite>Art Poétique,
        Œuvres poétiques complètes, Paul
        Verlaine.</cite></p>

Here a tool can extract
     "Plus vague et soluble dans l'air,"
     Art poétique, Œuvres poétiques complètes, Paul Verlaine
     urn:isbn:2-07010-579-2

I can have a process which uses only machines and tries to match the  
isbn and the title and/or the author. It will be only done by  
machine. Using for example services like http://worldcatlibraries.org/ 
wcpa/isbn/2-07010-579-2. Even easier in the case of HTTP URIs. All of  
that can be done by machine only.

>>          Conformance checkers must check that the input
>>         document conforms when scripting is disabled, and
>>         should also check that the input document conforms
>>         when scripting is enabled. (This is only a "SHOULD"
>>         and not a "MUST" requirement because it has been
>>         proven to be impossible. [HALTINGPROBLEM])
>>
>> Is the intented purpose of this is to define two levels of  
>> Conformance?
>
> What would the other level of conformance be? If it involves  
> executing scripts, would it be OK for conformance to the other  
> level to be undecidable be machine in a general case?

# must and should
* Conformance checkers must check all the must and should.
* Conformance checkers must check all the must only.

If the script is not executed and because it is a should, the  
Conformance checker silently ignores it and says conforms?
Then *another* conformance checker (more performant) had success  
running scripts and sees that the document does not conform, what  
does it say?
Once the document is conformant and once it is not.

>
> A snapshot of the DOM in a browser at a user-chosen point in time  
> could be checked for conformance, though. This would, again, not  
> involve executing scripts during the conformance checking process.

user-chosen point in time? No human interaction, we said above.

>>          The term "HTML5 validator" can be used to refer to a
>>         conformance checker that itself conforms to the
>>         applicable requirements of this specification.
>>
>> The way it is written here would mean that the piece of software  
>> has to be written in HTML 5, which doesn't make sense in many cases.
>
> No, the *applicable* conformance criteria for whether a conformance  
> checker itself is a conforming conformance checker aren't the  
> conformance criteria for documents.
>
>> Suggestion: "The term HTML5 validator can be used to refer to a  
>> software that meets the Conformance Checker requirements of this  
>> specification."
>
> Yeah, it is better to say what "applicable" means.

Indeed.


-- 
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
   QA Weblog - http://www.w3.org/QA/
      *** Be Strict To Be Cool ***
Received on Thursday, 14 June 2007 06:22:52 UTC