Re: An HTML language specification

Mark Baker wrote:
> It would need to include the definition of the attributes, but most
> importantly a definition of what "select" means, like HTML 4 provides;
> "The SELECT element creates a menu".  But sure, something like that.

Note that the text there was a direct quote from the section about the 
<select> element in the current HTML5 draft.  Also note that I snipped 
exactly the things you're talking about here, to keep length down.  The 
section goes on like so:

   The select element represents a control for selecting amongst a set
   of options.

   The multiple attribute is a boolean attribute. If the attribute is
   present, then the select element represents a control for selecting
   zero or more options from the list of options. If the attribute is
   absent, then the select element represents a control for selecting a
   single option from the list of options.

etc.  This seems to me like exactly what you want, no?

>> Or is the objection just to the way the parsing algorithm is specified and
>> not to the descriptions of individual elements?
> 
> It's both, to an extent.  The parser and much of the language is
> defined in DOM terms.

Maybe I'm looking at a bad example here, but <select> certainly doesn't 
seem to be defined in DOM terms (apart from its DOM interface, of 
course).  Should I be looking at some other element?  I just picked 
<select> because it's a sufficiently complex and interesting one that 
there _might_ have been DOM involved.  Something like <div> is even more 
clear-cut.  Here's the full section on <div>:

   4.12.2 The div element

   Categories
     Flow content.
   Contexts in which this element may be used:
     Where flow content is expected.
   Content model:
     Flow content.
   Element-specific attributes:
     None.
   DOM interface:
     Uses HTMLElement.

   The div element represents nothing at all. It can be used with the
   class, lang/xml:lang, and title attributes to mark up semantics
   common to a group of consecutive elements.

   Allowing div elements to contain phrasing content makes it easy for
   authors to abuse div, using it with the class="" attribute to the
   point of not having any other elements in the markup. This is a
   disaster from an accessibility point of view, and it would be nice if
   we could somehow make such pages non-compliant without preventing
   people from using divs as the extension mechanism that they are, to
   handle things the spec can't otherwise do (like making new widgets).

> I haven't had a detailed enough look at the
> parser to know if the DOM gets in the way though, or if it can simply
> be used as an abstract model as the spec says ("Implementations that
> do not support scripting do not have to actually create a DOM Document
> object, but the DOM tree in such cases is still used as the model for
> the rest of the specification.").

There are at least two HTML5 parser implementations that do not use a 
DOM: html5lib and Henri's validator.

 > But I'm still wary of using an implemented model as an abstract
 > one, lest nuances of the various implementations result in differing
> interpretations of the specification.

The DOM _is_ an abstract (language-agnostic, etc, etc) model, though. 
It's not like we're defining things in terms of a particular DOM 
implementation here, but in terms of fairly abstract DOM concepts like 
"parent", "first child", "list of children", "localName", "namespace", 
and so forth...

I guess the upshot it seems to me like the definitions of the elements 
are already what you want: prose that describes what the element and its 
various attributes mean, together with information, by reference, about 
what other elements it's allowed to contain in valid documents and where 
it's allowed to be placed in valid documents.  These last two are much 
the same information as that contained in the HTML4 spec in much less 
human-readable DTD form.  If there are exceptions to this setup, I'd 
love to know what they are.

-Boris

Received on Thursday, 20 November 2008 22:05:52 UTC