W3C home > Mailing lists > Public > public-html@w3.org > November 2008

Re: An HTML language specification

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Fri, 21 Nov 2008 14:52:02 -0500
Message-ID: <49271162.6070702@mit.edu>
To: "Philip TAYLOR (Ret'd)" <P.Taylor@Rhul.Ac.Uk>
CC: "public-html@w3.org" <public-html@w3.org>

Philip TAYLOR (Ret'd) wrote:
> Maciej Stachowiak wrote:
> 
>> HTML5 is indeed defined in prose, unlike many other languages where 
>> important normative requirements are delegated to a DTD or schema.
> 
> Hardly "delegated", Maciej : a DTD or schema is a formal
> language, ideal for (and indeed, intended for) such applications.
> Prose, being natural language, would be considered by
> many to lack the rigour necessary for such a task.

Odd.  I could have sworn that there were people (NOT Maciej, mind you, 
so I'm not sure why you're attacking him on this issue) clamoring for a 
normative natural-language definition of HTML on this list.  They don't 
seem to be happy with the idea of an informative natural-language guide.

That said,

> It may well be beneficial to /supplement/ the DTD or
> schema by prose, to aid its comprehension by those
> unfamiliar with the formal language

In practice, specifications that use a DTD or schema (or any other 
machine-readable syntax description) then use natural language for three 
things:

1)  Informative content, as you describe
2)  Additional normative restrictions that are too complex to express
     in the DTD or schema language involved.
3)  (rarely) Restrictions that need to be subtracted from the ones
     defined in the DTD or schema, but only in conditions
     too complex to describe in the DTD or schema.

Clearly option 2 is preferable to option 3.  ;)  Option 2 is also 
preferable to not having the normative descriptions at all.

And in practice, as an implementor, I've found that one has to read all 
the normative prose very carefully due to item (2) above, so the 
benefits of a machine-readable syntax description are largely lost: just 
because something satisfies the machine-readable description doesn't 
mean it's valid.  At the same time, the readability of prose is also 
lost, of course.  In some ways, a syntax that has a machine-readable 
part but also a natural-language part is harder to understand and 
implement than one that's defined entirely one way or entirely the 
other.  Of course one does have to be very careful with the 
natural-language thing.

-Boris
Received on Friday, 21 November 2008 19:52:49 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:59 UTC