Re: FW: Request that "conforming document" be better defined and more carefully referenced from Aryeh Gregor on 2010-02-09 (public-html@w3.org from February 2010)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Tue, 9 Feb 2010 14:35:14 -0500
To: Paul Cotton <Paul.Cotton@microsoft.com>
Cc: "public-html@w3.org" <public-html@w3.org>, "noah_mendelsohn@us.ibm.com" <noah_mendelsohn@us.ibm.com>
Message-ID: <7c2a12e21002091135q64f5d528q6516e43f640dd10c@mail.gmail.com>
On Mon, Feb 8, 2010 at 3:56 PM, Paul Cotton <Paul.Cotton@microsoft.com> wrote:
> This comment is regarding the term "conforming document".  As you know,
> the HTML 5 draft explicitly discusses [1] the conformance of Web browsers,
> noninteractive agents, conformance checkers, etc.  I have found no similar
> explicit definition of "conforming documents" or some similar term.

I believe "conforming" is used in its normal English sense.  A
document is conforming if it obeys all "must" requirements in the
HTML5 spec that logically apply to documents (as opposed to
requirements that only make sense for user agents).

> * Define one or more terms such as "conforming documents".  For each such
> term, provide a definition sufficiently rigorous that one can determine
> for any given string of characters (octet stream?) whether it is or is not
> conforming.

The spec defines conformance criteria for conformance checkers:

[[
Conformance checkers must verify that a document conforms to the
applicable conformance criteria described in this specification.
Automated conformance checkers are exempt from detecting errors that
require interpretation of the author's intent (for example, while a
document is non-conforming if the content of a blockquote element is
not a quote, conformance checkers running without the input of human
judgement do not have to check that blockquote elements only contain
quoted material).

Conformance checkers must check that the input document conforms when
parsed without a browsing context (meaning that no scripts are run,
and that the parser's scripting flag is disabled), and should also
check that the input document conforms when parsed with a browsing
context in which scripts execute, and that the scripts never cause
non-conforming states to occur other than transiently during script
execution itself. (This is only a "SHOULD" and not a "MUST"
requirement because it has been proven to be impossible. [COMPUTABLE])

The term "HTML validator" can be used to refer to a conformance
checker that itself conforms to the applicable requirements of this
specification.
]]
http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#conformance-requirements

The spec provides specific algorithms to check conformance where
possible.  I think the spec makes it very clear what a conforming
document is: it must obey all the conformance requirements given in
the spec (that apply to documents).  "Conforming" is used in its
regular English sense.

> Assuming I've got that right, it might be worth asking whether there
> should be separate terminology for conformance of documents that use only
> the features explicitly documented in HTML 5 (e.g. <p>, <table>, etc.) vs.
> documents that also use extensions from some applicable specification
> (<NoahsNewTag>).

This is what the current spec has to say:

[[
When vendor-neutral extensions to this specification are needed,
either this specification can be updated accordingly, or an extension
specification can be written that overrides the requirements in this
specification. When someone applying this specification to their
activities decides that they will recognise the requirements of such
an extension specification, it becomes an applicable specification for
the purposes of conformance requirements in this specification.
]]
http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#extensibility

The idea (as far as I can tell) is that HTML5 defines a specific set
of conformance requirements, and any document satisfying those is a
conforming HTML5 document.  If another spec extends HTML5, like the
HTML+RDFa spec, then documents that conform to it are conforming
HTML5+RDFa (or whatever).  Validators/authors/implementers/etc. can
consider them conforming or not as they choose, depending on whether
they want to accept that extension specification as applicable.

> II. Same as above, but apply the term "conforming document" to any syntax
> that >could have been< defined in an applicable specification.  (I suspect
> that there is some syntax, such as improperly nested tags, that you would
> prohibit even applicable specifications from specifying -- you should make
> clear what syntax and processing can and cannot be defined in such
> extension specs I think).

If an external specification is accepted as applicable, it can
override any requirements it sees fit.  There's really no way for one
spec to say other specs can't supersede it.

> IV. Encourage usage like: "conforming" for documents that use >only<
> features explicitly documented in HTML 5 and "conforming to HTML 5 as
> augmented by the XXXX and YYYY specifications" for documents that conform
> to identified extension specs.

I like this proposal best, personally.  For instance, a document using
RDFa would be conforming HTML5+RDFa, but (as long as RDFa is not part
of the main spec) not conforming HTML5.  The spec doesn't define this
terminology right now -- I'm not sure whether it should.

> For what it's worth, I think I like II. or II+IV best:  that is, when no
> additional specifications are explicitly called out, all the syntax that
>>could have< been defined by such an extension should be considered
> conforming.  That way you don't consider a document broken just because
> you can't name the spec that gave meaning to the new constructs.

IMO, it's very important that validators raise errors if they hit
unrecognized constructs.  If your page passes validation, it should
mean (ideally) that the page can be processed by a purely
standards-based user agent.  If there are unknown extensions present,
they're likely either errors or non-standard, unless the validator is
out of date.  Of course, authors can ignore "unrecognized element foo"
errors if they know that the element actually is part of a standard
that the validator doesn't recognize (but hopefully the validator
would be updated in that case).

A validator that didn't catch the typo in <html lagn="en"> because
lagn could possibly be an extension attribute would be kind of silly,
I think.  :)
Received on Tuesday, 9 February 2010 19:35:48 UTC