Re: ABNF or code fragments? from Jack Jansen on 2010-02-23 (public-media-fragment@w3.org from February 2010)

From: Jack Jansen <Jack.Jansen@cwi.nl>
Date: Tue, 23 Feb 2010 11:25:05 +0100
To: Philip Jägenstedt <philipj@opera.com>
Cc: "Media Fragment" <public-media-fragment@w3.org>
Message-Id: <4ADF5B67-B364-49C0-B06F-38E3136BD5AA@cwi.nl>
On 23 feb 2010, at 10:56, Philip Jägenstedt wrote:
>> But now I have a more serious question: it seems that the current draft has gotten all ABNF removed, and replaced by code fragments??!?
>> 
>> I don't remember that such a change has come up during a teleconf. Moreover, it is something that I have serious misgivings about: in a standards document we should use formal declarative languages such as ABNF as much as possible, and not vague english-based procedural pseudo-code...
> 
> The syntax is defined by ABNF and is still there, just split across sections and using the W3C XML spec contructs instead of a big blob.

Well... The ABNF that we used to have seems to be replaced by some form of EBNF. As far as I know (but: syntax gurus, please correct me if I'm wrong) EBNF has the serious problem that there is no single definition of it, so the exact meaning has again to be guessed at. If I remember correctly this is exactly the reason ABNF was created, to supersede EBNF.

> 
> Processing however, can't be defined in terms of ABNF as it includes things like percent decoding, UTF-8 decoding and ignoring name-value pairs that aren't valid syntax (necessary to not break existing parsers by introducing new names in future versions of the spec).

Why can't you define this in ABNF? Obviously you can only define the syntax in ABNF, not the semantics. And it is open to discussion whether a statement such as "you can't specify the same name twice" is syntax or semantics. If you decide for the first your ABNF becomes pretty hairy, so that's why I would opt for the second.

> If there is anything vague about the processing requirements, please point out what is ambiguous so we can fix it.


It's English! There are no parsing rules for english. For example, take section 5.1.2, step 3c:

	Let pct-value be the substring from after the first "=" in name-value to the end of name-value, or the empty string if name-value does not include "=".

First problem is that this sentence is unreadable, I have to stare at it at least a minute before I understand what it tries to say. But: this understanding is based on all sorts of implicit assumptions. Let's play devils advocate, and put some grouping parentheses in this sentence. I assume the intention of the original author was:

	Let pct-value be [[the substring from after [the first "=" in name-value] to [the end of name-value]], or [the empty string] if [name-value does not include "=".]]

or, in Python
       if '=' in name_value:
		pct_value = name_value.substr(name_value.find("=")+1,-1)
	else:
		pct_value = ""

But the following is just as valid an english breakdown of the sentence:

	Let pct-value be [the substring from after [the first "=" in name-value to [[the end of name-value], or [the empty string] if [name-value does not include "=".]]

which, in Python, would be
	if '=' in name_value:
		pct_value = name_value.substr(name_value.find("=")+1, -1)
	else:
		pct_value = name_value.substr(name_value.find("=")+1, name_value.find(""))

This is nonsense, but there are absolutely no guarantee that there aren't other places where the result wouldn't be obvious nonsense.

And note that grouping  is only part of the problem: in this one sentence there is use of the concepts "substring", "from after", and "include". None of these concepts have a rigorous definition, they are open to mis-interpretation.

--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma Goldman
Received on Tuesday, 23 February 2010 10:26:06 UTC