W3C home > Mailing lists > Public > public-html@w3.org > April 2007

Re: Formal definition of HTML5 (was Re: Version information)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 17 Apr 2007 00:47:53 +0000 (UTC)
To: Henrik Dvergsdal <henrik.dvergsdal@hibo.no>
Cc: public-html@w3.org
Message-ID: <Pine.LNX.4.62.0704170034240.17772@dhalsim.dreamhost.com>

On Tue, 17 Apr 2007, Henrik Dvergsdal wrote:
> > 
> > Ah, ok. I don't want to do that because I have received feedback from 
> > a number of implementors and authors that they find english prose 
> > easier to understand than formal gramars.
> I don't have enough data on this to argue with you, but I guess this 
> depends on training as well as the complexity of what is described and 
> precision requirements.

Well, James (parser tool implementer), and Henri (conformance checker 
implementor) have both commented on this thread saying that they find 
english prose easier to use than formal gramars (assuming I'm not 
misrepresenting their opinions). Since they're both actively implementing 
the spec, their opinion has great weight.

> > > 1. It will facilitate a more tidy and efficient way of managing the 
> > > standard.
> > 
> > I don't think so. My understanding is that it is inordinately 
> > complicated to express some of the spec's current restrictions in a 
> > formal grammar.
> Which is exactly the point. This is a lot of very complicated work. It 
> should only be done once.

I'm confused. Would it make the spec simpler or more complicated to read?

> > > To have competing schemas (or other specification techniques) 
> > > reflect a spec like this, will lead to a chaotic situation in which 
> > > a lot of people will waste a lot of time.
> > 
> > How is this different to the situation with the browsers?
> I'm not sure what you mean here. If the situation with the browsers is a 
> chaotic one where people are wasting time, we don't want to repeat that 
> in web applications.

I don't think that your description is a bad thing. A "chaotic situation 
in which a lot of people will waste a lot of time" is how software 
development in a healthy industry works. You get a lot of implementations, 
which acts both to incent competitive behaviour, and thus higher quality 
implementations, as well as to repeatedly test that the specification is 

Having a single implementation -- worse, one run by committee -- 
encourages stagnation. We have evidence of this already: HTML4 basically 
never saw any serious work on conformance checking, most conformance 
checkers did little more than DTD validation, which (as I pointed out) is 
ridiculously inadequate. With HTML5, we don't have a formal schema, the 
spec isn't even remotely complete, and we already have someone writing a 
conformance checker that's more detailed than HTML4 ever had.

> OK, maybe it could be useful to maintain different versions for 
> different schema languages.
> It would be interesting to have an overview of what schemas are actually 
> being developed. Who knows, maybe there's even a DTD here somewhere.

I agree, both of the above sound like very useful projects.

> That's good, but I think decisions like this should be recorded 
> somewhere, just so that we can rule them out as bugs.

I agree. If you (or anyone reading this) would like to work on capturing 
reasoning behind changes in the specification, I'm happy to help in that 

> > No, but that says more about my ability to understand an arbitrary 
> > grammar without knowing its language than it does about the 
> > suitability of English prose. (What is the difference? Oh, is it that 
> > it doesn't allow the lack of TBODYs and TRs?)
> My point is that the prose will eventually have to be translated to a 
> schema - even if its just in order to define an implementation detail.

Sure. In fact, *everything in the spec* will eventually have to be 
translated to machine-readable code. Probably multiple times, by multiple 
vendors. That's what a specification is, a set of rules for 
implementations to be written against.

> > > "When used as the child of a figure element, or, when used as a 
> > > figure fallback object: Zero or more param elements, followed by 
> > > either zero or more block-level elements or a single object element, 
> > > which is then considered to be a figure fallback object.
> > >
> > > Otherwise: Zero or more param elements, followed by inline-level 
> > > content."
> >
> > [...] you couldn't express this in non-prose at all, let alone more 
> > clearly [...]
> I'm not sure if that I agree on all of this, but lets leave it for now

In case the part you didn't agree with was the part I left in the above 
quoting, it's the "which is then considered to be" part that I don't think 
you can express in any schema language.

> > > 6. It will make the text of the standard more accessible, at least 
> > > for "competent" developers. When you get used to the formal syntax 
> > > it is much easier to read than the prose.
> > 
> > Sadly, most developers and implementors (and spec writers!) do not 
> > fall under the label "competent" by that definition.
> I guess I should have written "developers with basic training in 
> writing/using formal grammars"

Indeed. Most developers and implementors don't have such training. For 
what it's worth, neither do I. (I'm a physicist by training.)

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 17 April 2007 00:48:07 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:18 UTC