Suggestions regarding creation of an "Authoring Specification" for HTML 5

Ian:  at breakfast this morning we discussed some ideas regarding the 
creation of an authoring specification for HTML 5, and you encouraged me 
to remind you of some of my suggestions with a note to the comments list. 
Here 'tis.

My overall comment is that I think the specification for the authoring of 
correct HTML 5 documents is of great importance.  I understand that you 
are hoping that the need can be met in part by, eventually, using scripts 
to produce a stripped down version of the existing draft, leaving out much 
of the parsing and error recovery detail.  Perhaps this will lead to a 
first class result, but I have some nervousness that the result might not 
be as effective as one might like.  Accordingly, I'd like to suggest that 
the following be considered:

* Simple success criteria should be explicitly set down for the authoring 
specification.  What information must it convey, to which audiences should 
it be comprehensible, etc.?  I'm not suggesting any big, elaborate, or 
time-consuming requirements effort, just a very brief set of criteria that 
could be agreed by the community as a yardstick for judging any particular 
draft.   Perhaps this already exists. 
* I think it would be a good idea to generate representative drafts sooner 
rather than later.  If practical, this could be done by marking up the 
existing draft and running the full automated process.  If that's 
impractical soon, as I suspect may be the case, I would think that one or 
two members of the HTML working group could be tasked with manually 
producing a partial skeleton for evaluation, including at least some of 
the key sections such as 8.1, and representative slices of some of the 
others.  For example,  if one or two microsyntaxes and the definitions of 
a few representative elements were converted, it would probably give a 
very good idea as to whether the presentation of all of them would 
eventually be effective.  I think the resulting draft should be circulated 
for comment, and should be used to inform planning for how the final HTML 
5 authoring draft will eventually be prepared.

* I think there are good reasons why most of the semantics of HTML 5 are 
explained in terms of the DOM, but it's worth keeping in mind that for 
authors (except when scripting), it's the serialized document that's of 
primary concern.  So, it's worth explaining clearly and early the key 
invariants of what a legal HTML 5 document looks like.  For example: 
"Start tags look like <this>, end tags look like </this>; elements are 
properly nested and thus encode a tree, which by the way is isomorphic to 
the corresponding DOM tree; etc..   Determining thinks like this from the 
existing specification is a bit of a theorem proving exercise:  you have 
to notice that the DOM is always a tree, even though browsers accept input 
that's poorly nested, you have to notice that there are serialization 
rules that invariably result in properly nested tags, and you have realize 
that those in turn define what is intented as legal HTML 5.  There's a 
risk that, if all one does is to strip the existing spec. to produce the 
authoring spec, these key aspects of correct HTML 5 will be unduly hard to 
discover.

* I think the authoring specification is important enough that attention 
should be given to introductory material, organization of the table of 
contents, etc.  Perhaps this comment is obvious, in which case I apologize 
for mentioning it.  Right now, I understand that the most critical section 
for authors is in section 8.1, so it's not immediately obvious that a 
simple stripping of the existing draft will result in a document that 
flows in sensible order, with key concepts suitably highlighted.  For 
example, I could imagine introductory material setting out some of the 
information mentioned in the bullet above.  You could also any general 
syntactic rules, such as whether tags need to be explicitly closed or can 
be implicitly closed by the end tag for a parent, and if it's not obvious 
from the table of contents, provide simple guidance as to which sections 
are good starting points for learning key concepts.

I hope these suggestions are helpful.   I should point out that they 
represent my personal suggestions, and not necessarily those of other W3C 
TAG members. Thank you.

Noah



--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Monday, 20 October 2008 17:06:27 UTC