Re: Media Fragments URI parsing: pseudo algorithm code

On Wed, 30 Jun 2010 16:22:15 +0200, Yves Lafon <ylafon@w3.org> wrote:

> On Wed, 30 Jun 2010, Philip Jägenstedt wrote:
>
>> You cannot write a robust MF parser based on this grammar, because  
>> t=1&foo=bar is not a valid production, meaning that any future  
>> extension foo of MF will cause that parser to fail completely. Either  
>> the grammar itself must be relaxed, or the parsing must be defined  
>> normatively and handle some things which are not valid productions of  
>> the grammar.
>
> What do you mean by "robust" ?

I mean that it doesn't stop working completely for future additions to the  
syntax, that it should degrade gracefully. If browsers shipped with a  
parser based on the ABNF of MF 1.0, then #t=1 will work and authors will  
use it. MF 2.0 then adds a foo dimension, but authors won't be able to use  
#t=1&foo=bar because the t=1 part would also be ignored until all browsers  
have upgraded to a MF 2.0 parser. That's the opposite of graceful  
degradation.

>>> With the current grammar, it is allowed only in track and id  
>>> productions.
>>> So it is perfectly compatible with the processing defined in rfc3986  
>>> and perfectly allows #track=A%20%26%20B&t=10
>>
>> No disagreement that we need to define it, thankfully. The disagreement  
>> is only where to decode percent-encoding.
>
> RFC3986 gives the answer, after the URI components are parsed (and we  
> define here how to split out in components).

The disagreement here is only for which components to decode  
percent-encoding, RFC3986 will not help us.

>> <issue>
>>
>> MF parsing must be defined normatively in the MF spec itself, meeting  
>> these conditions:
>>
>> 1. should handle all valid productions of the ABNF syntax correctly  
>> and, where necessary, input which is not valid per the syntax.
>>
>> 2. must be forward-compatible, so that future extensions to MF do not  
>> break existing MF parsers. (Compare to how new HTML elements and  
>> attributes or CSS properties degrade in implementations that don't  
>> understand them.)
>
> I completely disagree with this, as it may preclude other uses than  
> mediafragment to use an "a=b" syntax.

If you expect not only future revisions of MF but also completely  
unrelated uses to co-exist in the same fragment component, then that's all  
the more reason for parsers to not fail when encountering unknown  
name-value pairs. That aside, since it is only the MIME registrations that  
have the authority to say that MF applies, one can safely assume they  
won't do something crazy like allowing conflicting syntaxes in the same  
component. Can you please give a concrete example of a problem that might  
arise?

>> 3. should match as closely as possible how query components on the form  
>> a=1&b=2 are parsed by existing server-side software (e.g. ASP, PHP,  
>> JSP, Perl CGI)
>>
>> </issue>
>>
>> An implementation that conforms exactly to the (current, non-normative)  
>> ABNF fails condition 2 (e.g. t=1&foo=bar) and is not an option.
>
> As I completely disagree with 2, strict ABNF makes perfect sense.
>

Are there any existing experimental implementations that actually take  
this approach? I'd love to experiment with them to show how fragile it is.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Wednesday, 30 June 2010 15:46:09 UTC