Re: ACTION-187: extensibility and parsing from Philip Jägenstedt on 2010-09-24 (public-media-fragment@w3.org from September 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Fri, 24 Sep 2010 10:09:28 +0200
To: public-media-fragment@w3.org
Message-ID: <op.vji192f0sr6mfa@kirk>
On Fri, 24 Sep 2010 06:56:33 +0200, Davy Van Deursen  
<davy.vandeursen@ugent.be> wrote:

> Citeren Silvia Pfeiffer <silviapfeiffer1@gmail.com>:
>> On Wed, Sep 22, 2010 at 9:12 PM, Philip Jägenstedt  
>> <philipj@opera.com>wrote:
>>
>>> As request, a short summary of the long standing issue of syntax,  
>>> parsing
>>> and how that relates to extensibility.
>>>
>>> By extensibility I am not primarily talking about 3rd parties  
>>> extending MF,
>>> but about our own possibilities of updating the spec after MF 1.0. For  
>>> the
>>> purpose of discussion, assume that we want to add a dimension for  
>>> filtering
>>> the audio, e.g., freq=300,3000 to keep only the part of the audio that
>>> corresponds (approximately) to human voice (300Hz-3000Hz).
>>>
>>> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ?  
>>> This is
>>> the core point of disagreement, and the question is really about how  
>>> MF 1.0
>>> parsers should work. Leaving it undefined is not a good option, as the
>>> history clearly shows. Two other options have been on the table:
>>>
>>> 1. Require that parsing follow a strict ABNF syntax like the one we  
>>> have.
>>> Since freq is not part of the MF 1.0 syntax, parsing  
>>> t=10,500&freq=300,3000
>>> will fail and the whole fragment will be ignored, including t=10,500.
>>>
>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>> syntax. The concrete suggestion I've made is that the algorithm or  
>>> syntax
>>> should match how query strings work. That is, a list or key-value  
>>> pairs is
>>> formed by splitting the string on & and =. As a second step, that list  
>>> is
>>> traversed to match the keys against the dimensions and parsed  
>>> according to
>>> the ABNF syntax of each dimension. Crucially, unrecognized/invalid  
>>> keys or
>>> values are ignored. That means that in the above example, the time  
>>> dimension
>>> will keep working even if an unrecognized (to a MF 1.0 implementation)  
>>> freq
>>> dimension is used.
>>>
>>> Note: Neither 1 or 2 are requirements on using any specific  
>>> implementation
>>> technique, only to behave *as if* you are, which still leaves plenty  
>>> of room
>>> for different approaches.
>>>
>>> I strongly favor option number 2, and see these benefits:
>>>
>>> * It works like query strings, just like one would expect from looking  
>>> at
>>> the syntax. The algorithm I've suggested is actually from testing query
>>> string parsing in PHP, ASP, ASP.NET, CGI.pl and JSP, as reported  
>>> earlier
>>> on this list.
>>>
>>> * It's simpler for implementors, as we won't have to implement  
>>> everything
>>> at once. This is likely what's going to happen, as the time dimension  
>>> is
>>> ready to implement, while the named dimension is still not clear how to
>>> apply to e.g. a WebM or Ogg resource.
>>>
>>> * It's better for extensibility, as adding new dimensions doesn't  
>>> break all
>>> existing implementations. Imagine if adding a new element to HTML would
>>> cause pages to render completely blank in all existing browsers. Not  
>>> even
>>> XHTML is that strict.
>>>
>>> Please comment, we need to reach some kind of consensus on this soon  
>>> and
>>> move on. If we can agree on what we want, we can then discuss how to  
>>> change
>>> the spec accordingly (algorithm or ABNF, etc...)
>>
>>
>>
>> I also strongly favor option number 2. I don't think anything else makes
>> sense, actually, because we would fail to interoperate with  other  
>> schemes
>> that use fragments and queries on media resources. Only name-value pairs
>> that do not parse according to our ABNF will be ignored from the  
>> viewpoint
>> of media fragments. They can be used by the browser or server for other
>> purposes.
>
> Same opinion here, option 1 doesn't seem to make sense. However, should  
> we allow any unknown constructions in the URI fragment or
> just key-value pairs with an unknown key? For example:
> - t=10,500&freq=300,3000: should be a valid fragment IMO, as indicated  
> by Philip's arguments;
> - t=10,500&foo: is this a valid media fragment? According to Philip's  
> parsing algorithm, I think it is not. From an extension point
> of view, disallowing such a construction should be fine since we can  
> rewrite this as t=10,500&foo=true if we want to obtain
> key-value pairs. Note that I'm not in favor of allowing other things  
> than key-value pairs, I just wanted to point out this case.

The ABNF I suggested in  
<http://lists.w3.org/Archives/Public/public-media-fragment/2010Aug/0005.html>  
isn't complete, it's just the first level defining name-value pairs. I  
think that we should define validity in a way that makes validators warn  
about things that aren't part of MF 1.0, to help authors find typos, etc.  
There are many ways we could achieve that spec-wise, if we agree on what  
we want. Validity and parsing can and should be separate, so we don't need  
to agree on exact details for the purposes of this discussion.

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Friday, 24 September 2010 08:10:11 UTC