Re: ABNF or code fragments?

On Tue, 23 Feb 2010 18:25:05 +0800, Jack Jansen <Jack.Jansen@cwi.nl> wrote:

>
> On 23 feb 2010, at 10:56, Philip Jägenstedt wrote:
>>> But now I have a more serious question: it seems that the current  
>>> draft has gotten all ABNF removed, and replaced by code fragments??!?
>>>
>>> I don't remember that such a change has come up during a teleconf.  
>>> Moreover, it is something that I have serious misgivings about: in a  
>>> standards document we should use formal declarative languages such as  
>>> ABNF as much as possible, and not vague english-based procedural  
>>> pseudo-code...
>>
>> The syntax is defined by ABNF and is still there, just split across  
>> sections and using the W3C XML spec contructs instead of a big blob.
>
> Well... The ABNF that we used to have seems to be replaced by some form  
> of EBNF. As far as I know (but: syntax gurus, please correct me if I'm  
> wrong) EBNF has the serious problem that there is no single definition  
> of it, so the exact meaning has again to be guessed at. If I remember  
> correctly this is exactly the reason ABNF was created, to supersede EBNF.

OK, so we should revert to using ABNF. We need to replace '/' with '|' or  
vice versa, I can't remember which is ABNF. I asked on multiple occasions  
if someone could check if the EBNF was OK, but no one did (until now).

>> Processing however, can't be defined in terms of ABNF as it includes  
>> things like percent decoding, UTF-8 decoding and ignoring name-value  
>> pairs that aren't valid syntax (necessary to not break existing parsers  
>> by introducing new names in future versions of the spec).
>
> Why can't you define this in ABNF? Obviously you can only define the  
> syntax in ABNF, not the semantics. And it is open to discussion whether  
> a statement such as "you can't specify the same name twice" is syntax or  
> semantics. If you decide for the first your ABNF becomes pretty hairy,  
> so that's why I would opt for the second.

Is it possible to use ABNF to get the below behavior?

#t=1&t=2 => time offset 2 (last valid name-value wins)

#t=1&t=bla => time offset 1 (invalid syntax ignored)

#%74=%31 => time offset 1 (percent decoding after splitting)

How about UTF-8 decoding after percent decoding?

>> If there is anything vague about the processing requirements, please  
>> point out what is ambiguous so we can fix it.
>
>
> It's English! There are no parsing rules for english. For example, take  
> section 5.1.2, step 3c:
>
> 	Let pct-value be the substring from after the first "=" in name-value  
> to the end of name-value, or the empty string if name-value does not  
> include "=".
>
> First problem is that this sentence is unreadable, I have to stare at it  
> at least a minute before I understand what it tries to say. But: this  
> understanding is based on all sorts of implicit assumptions. Let's play  
> devils advocate, and put some grouping parentheses in this sentence. I  
> assume the intention of the original author was:
>
> 	Let pct-value be [[the substring from after [the first "=" in  
> name-value] to [the end of name-value]], or [the empty string] if  
> [name-value does not include "=".]]
>
> or, in Python
>        if '=' in name_value:
> 		pct_value = name_value.substr(name_value.find("=")+1,-1)
> 	else:
> 		pct_value = ""
>
> But the following is just as valid an english breakdown of the sentence:
>
> 	Let pct-value be [the substring from after [the first "=" in name-value  
> to [[the end of name-value], or [the empty string] if [name-value does  
> not include "=".]]
>
> which, in Python, would be
> 	if '=' in name_value:
> 		pct_value = name_value.substr(name_value.find("=")+1, -1)
> 	else:
> 		pct_value = name_value.substr(name_value.find("=")+1,  
> name_value.find(""))
>
> This is nonsense, but there are absolutely no guarantee that there  
> aren't other places where the result wouldn't be obvious nonsense.
>
> And note that grouping  is only part of the problem: in this one  
> sentence there is use of the concepts "substring", "from after", and  
> "include". None of these concepts have a rigorous definition, they are  
> open to mis-interpretation.

I started out with an explicit algorithm for splitting a string, but  
optimized it away as per above. Should we introduce a string-splitting  
operation in the spec to fix the above step?

I don't care a great deal how the spec expresses the processing rules, as  
long as the rules are what we want them to be (i.e. we shouldn't change  
the rules to make it easier to express in any particular way).

These issues were discussed before:

http://lists.w3.org/Archives/Public/public-media-fragment/2009Dec/0015.html
http://lists.w3.org/Archives/Public/public-media-fragment/2009Nov/0023.html

I announced my changes here:

http://lists.w3.org/Archives/Public/public-media-fragment/2010Jan/0034.html

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Tuesday, 23 February 2010 12:49:13 UTC