Request that "conforming document" be better defined and more carefully referenced

I hope you will consider this comment on the HTML 5 draft Recommendation.  
This is a personal comment from me, and does not necessarily represent the 
opinion of other TAG members or the TAG as a whole.

Background
==========

This comment is regarding the term "conforming document".  As you know, 
the HTML 5 draft explicitly discusses [1] the conformance of Web browsers, 
noninteractive agents, conformance checkers, etc.  I have found no similar 
explicit definition of "conforming documents" or some similar term.

At first I thought: can this be right? Is the idea that only conformance 
of code is specified?  Then I looked further and noticed statements like:

        "Authoring tools and markup generators must generate >conforming 
documents<."

However, the term "conforming documents" seems not to be explicitly 
defined, and is in any case not hyperlinked from references like this.  I 
think that's a problem that should be fixed.

It seems to me that defining conformance for documents is one of the most 
important purposes of this specification, and doing so is probably 
essential if the spec is to be used as the basis of a media type 
registration.  Quotes like the one above suggest that, in fact, it is the 
intention that the term "conforming documents" be defined.

Suggestions
===========

I can think of a number of approaches, any of which would probably be 
acceptable from my point of view.

* Define one or more terms such as "conforming documents".  For each such 
term, provide a definition sufficiently rigorous that one can determine 
for any given string of characters (octet stream?) whether it is or is not 
conforming.  If additional information is required to make that 
determination, e.g. if conformance turns out to depend on something like 
an externally specified character encoding, then say so, and indicate that 
conformance for a document is relation defined on the combination 
document_string+additional_info.

* Hyperlink to each such definition from all suitable references elsewhere 
in the document.

Conformance terminology when "applicable specifications" define HTML 5 
extensions
=================================================================================

I am probably going to make some comments on extensibility in general in 
another note, but here it's worth briefly discussing the impact on 
document conformance terminology.  I'll assume for the moment that, 
informally, the intention of the current text is that additional 
specifications can be written to augment HTML 5.  Let's take that as a 
given.  Presumably, such "applicable specifications" can provide specific 
meanings for additional markup;  maybe or maybe not they can also define 
things like nonstandard DOM mappings for such additional constructs.

Assuming I've got that right, it might be worth asking whether there 
should be separate terminology for conformance of documents that use only 
the features explicitly documented in HTML 5 (e.g. <p>, <table>, etc.) vs. 
documents that also use extensions from some applicable specification 
(<NoahsNewTag>).

I don't actually have a strong opinion on which way you go with this, but 
as things stand the spec is mushy in this area, I think your choice should 
be unambiguous.  Some options appear to be:

I. A single term, "conforming document", that includes documents using 
extensions that are explicitly defined in some applicable specification. I 
think this is closest to what you currently intend, but I confess I find 
it a bit too tricky.

II. Same as above, but apply the term "conforming document" to any syntax 
that >could have been< defined in an applicable specification.  (I suspect 
that there is some syntax, such as improperly nested tags, that you would 
prohibit even applicable specifications from specifying -- you should make 
clear what syntax and processing can and cannot be defined in such 
extension specs I think).

III. Two terms, perhaps "conforming: html5-only" would apply to to 
documents that use >only< features explicitly documented in HTML 5, vs. 
something like "conforming: html5-extended" for your choice of the first 
two options.  Then you'd know that "html5-only" documents would be 
universally interoperable, and "html5-extended" documents would depend on 
extension support.

IV. Encourage usage like: "conforming" for documents that use >only< 
features explicitly documented in HTML 5 and "conforming to HTML 5 as 
augmented by the XXXX and YYYY specifications" for documents that conform 
to identified extension specs.

For what it's worth, I think I like II. or II+IV best:  that is, when no 
additional specifications are explicitly called out, all the syntax that 
>could have< been defined by such an extension should be considered 
conforming.  That way you don't consider a document broken just because 
you can't name the spec that gave meaning to the new constructs.  Then you 
can also adopt IV to allow people to explicitly call out conformance to 
the combination of specs that have been used;  in this case, the semantics 
(and perhaps specialized processing of) the extensions is part of the 
conformance.  Calling out html5-only may have merit too.

So, in a nutshell, I'm suggesting that all the terminology regarding 
conformance of documents be made more explicit, and that the key terms be 
hyperlinked.  Thank you!

Noah

[1] http://dev.w3.org/html5/spec/Overview.html#conformance-requirements

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Friday, 5 February 2010 23:25:27 UTC