Re: An HTML language specification from Maciej Stachowiak on 2008-11-23 (public-html@w3.org from November 2008)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 23 Nov 2008 15:17:28 -0800
To: Philip TAYLOR <P.Taylor@Rhul.Ac.Uk> (Ret'd)
Cc: Boris Zbarsky <bzbarsky@MIT.EDU>, "public-html@w3.org" <public-html@w3.org>
Message-id: <672950F5-7750-4C25-89A8-85489607D232@apple.com>

By the way, just to avoid being vague as to my personal view:

In general I like it when specifications can express certain  
conformance requirements using a formal, structured syntax. One  
example of where this really works is the use of IDL for defining  
scripting interfaces. HTML5 already does that. I also like it when  
syntax is expressed with EBNF or a regular expression; HTML5 does not  
do that, but I think some of the microsyntaxes at least could benefit  
from it.

One reason such formalisms are helpful is that they can often be  
machine-checked for certain properties, for example syntactic grammars  
can be checked for ambiguity. Another reason is that they can more  
often be converted directly into implementations, thus leaving less  
room for implementation mistakes and further validating the soundness  
of the spec.

However, DTDs and schema languages in particular do not appear to do a  
very good job of delivering these kinds of benefits. Because their  
expressiveness is limited (much more so in the case of DTDs), they are  
often augmented by additional normative requirements in prose.  
Overall, that makes things more difficult, not less. You can't machine- 
check the formal syntax by itself, nor can you use it directly in an  
implementation. Furthermore, some implementors end up looking at just  
the schema and ignoring the other conformance requirements (or  
loosening of requirements). So I tentatively agree with the decision  
not to have a schema or DTD for HTML5.

HTML5 does sometimes have very useful formalisms that are expressed in  
prose. For example, the state machine model of the parsing algorithm  
lends itself to fairly direct translation into a machine model, making  
it easy to check such properties as whether it always parses a  
document unambiguously. Regrettably, now that I look at it, the  
section "Writing HTML documents", does not apply formalisms as  
usefully. It would be a useful excercise, for example, to verify that  
any document produced according to the "Writing HTML documents" syntax  
would be parsed without errors by the parsing algorithm. But the  
current text is not very amenable to converting to a  suitable machine- 
processable form. On the other hand, it seems to get needlessly  
verbose and formal when describing literal sequences of characters,  
such as the required doctype. While such text may be unambiguous, it  
is also quite hard to read compared to something like EBNF.

Regards,
Maciej

On Nov 22, 2008, at 1:22 AM, Philip TAYLOR (Ret'd) wrote:

>
> Fair enough : if you were using "delegate" in
> a neutral sense, then I have no problem with that.
>
> Philip TAYLOR
> --------
> Maciej Stachowiak wrote:
>
>> To the best of my understanding, "delegate" carries no negative  
>> connotation". I meant it roughly in sense 5 from <http://dictionary.reference.com/browse/delegate 
>> >: "5. to commit (powers, functions, etc.) to another as agent or  
>> deputy". It is generally considered a virtue to delegate  
>> appropriately, and a vice to delegate inappropriately.
>> I think you may have read a negative connotation into my statement  
>> because of the similarity to the word "relegate", which means  
>> things such as "to send into exile" or "to assign to a place of  
>> insignificance or of oblivion". But "delegate" does not mean  
>> anything like that.
>> Regards,
>> Maciej
>

Received on Sunday, 23 November 2008 23:18:10 UTC