Re: Formal definition of HTML5 (was Re: Version information)

On Apr 17, 2007, at 03:47, Ian Hickson wrote:

> Well, James (parser tool implementer), and Henri (conformance checker
> implementor) have both commented on this thread saying that they find
> english prose easier to use than formal gramars (assuming I'm not
> misrepresenting their opinions).

In the *simple* cases, RELAX NG *Compact* Syntax with the *fantasai  
indent style* or simple XPath is competitive with English, in my  
opinion. The problem is that there's a whole lot more than just  
simple cases, so as a uniform solution, English scales better with  
complexity.

And you don't need to look far to get to less simple. For example,  
Henrik's example about the <table> content model already lists tbody  
and tr twice in the RELAX NG production as an artifact of the way  
grammars work. As another example, here's what the content model of  
the <p> element looks like:
	p.inner =
		(	common.inner.strict-inline
		|	(	common.inner.struct-inline
			&	nonRoundtrippable & nonHTMLizable
			)
		)
To know what is going on, the reader needs to know about RELAX NG  
parametrization design patterns and to know that the set of hedges  
that can be derived from common.inner.strict-inline is a subset of  
the set of hedges that can be derived derived from  
common.inner.struct-inline. Moreover, if stuff like this went into  
the spec, the editor would need to agree with me and fantasai about  
the level of parametrizability (perhaps at the expense of  
readability) or else the real schema implementation would diverge  
from the spec illustrations anyway.

The last point is an important one. As an implementor, I have the  
following potential places for a given conformance criterion:
1) The parser. (Example: Charmod stuff.)
2) A filter between the parser and the validation layer. (Example:  
xml:id.)
3) A RELAX NG schema. (Example: <blockquote> has an optional  
attribute called cite.)
4) A Schematron schema. (Example: No sectioning elements as  
descendants of <header>.)
5) A RELAX NG datatype written in Java. (Example: Web Forms 2.0 weeks.)
6) A SAX2 ContentHandler written in Java. (Example: Table integrity.)
I don't want the spec to lock these decisions down for me. When a  
spec locks these down, like HTML 4.01 did with DTDs, better  
implementation judgment calls need to be explained to people over and  
over again (like the legitimacy of Relaxed).

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 17 April 2007 09:28:27 UTC