W3C home > Mailing lists > Public > public-media-fragment@w3.org > September 2010

Re: ACTION-187: extensibility and parsing

From: Philip Jägenstedt <philipj@opera.com>
Date: Fri, 24 Sep 2010 10:09:28 +0200
To: public-media-fragment@w3.org
Message-ID: <op.vji192f0sr6mfa@kirk>
On Fri, 24 Sep 2010 06:56:33 +0200, Davy Van Deursen  
<davy.vandeursen@ugent.be> wrote:

> Citeren Silvia Pfeiffer <silviapfeiffer1@gmail.com>:
>> On Wed, Sep 22, 2010 at 9:12 PM, Philip Jägenstedt  
>> <philipj@opera.com>wrote:
>>> As request, a short summary of the long standing issue of syntax,  
>>> parsing
>>> and how that relates to extensibility.
>>> By extensibility I am not primarily talking about 3rd parties  
>>> extending MF,
>>> but about our own possibilities of updating the spec after MF 1.0. For  
>>> the
>>> purpose of discussion, assume that we want to add a dimension for  
>>> filtering
>>> the audio, e.g., freq=300,3000 to keep only the part of the audio that
>>> corresponds (approximately) to human voice (300Hz-3000Hz).
>>> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ?  
>>> This is
>>> the core point of disagreement, and the question is really about how  
>>> MF 1.0
>>> parsers should work. Leaving it undefined is not a good option, as the
>>> history clearly shows. Two other options have been on the table:
>>> 1. Require that parsing follow a strict ABNF syntax like the one we  
>>> have.
>>> Since freq is not part of the MF 1.0 syntax, parsing  
>>> t=10,500&freq=300,3000
>>> will fail and the whole fragment will be ignored, including t=10,500.
>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>> syntax. The concrete suggestion I've made is that the algorithm or  
>>> syntax
>>> should match how query strings work. That is, a list or key-value  
>>> pairs is
>>> formed by splitting the string on & and =. As a second step, that list  
>>> is
>>> traversed to match the keys against the dimensions and parsed  
>>> according to
>>> the ABNF syntax of each dimension. Crucially, unrecognized/invalid  
>>> keys or
>>> values are ignored. That means that in the above example, the time  
>>> dimension
>>> will keep working even if an unrecognized (to a MF 1.0 implementation)  
>>> freq
>>> dimension is used.
>>> Note: Neither 1 or 2 are requirements on using any specific  
>>> implementation
>>> technique, only to behave *as if* you are, which still leaves plenty  
>>> of room
>>> for different approaches.
>>> I strongly favor option number 2, and see these benefits:
>>> * It works like query strings, just like one would expect from looking  
>>> at
>>> the syntax. The algorithm I've suggested is actually from testing query
>>> string parsing in PHP, ASP, ASP.NET, CGI.pl and JSP, as reported  
>>> earlier
>>> on this list.
>>> * It's simpler for implementors, as we won't have to implement  
>>> everything
>>> at once. This is likely what's going to happen, as the time dimension  
>>> is
>>> ready to implement, while the named dimension is still not clear how to
>>> apply to e.g. a WebM or Ogg resource.
>>> * It's better for extensibility, as adding new dimensions doesn't  
>>> break all
>>> existing implementations. Imagine if adding a new element to HTML would
>>> cause pages to render completely blank in all existing browsers. Not  
>>> even
>>> XHTML is that strict.
>>> Please comment, we need to reach some kind of consensus on this soon  
>>> and
>>> move on. If we can agree on what we want, we can then discuss how to  
>>> change
>>> the spec accordingly (algorithm or ABNF, etc...)
>> I also strongly favor option number 2. I don't think anything else makes
>> sense, actually, because we would fail to interoperate with  other  
>> schemes
>> that use fragments and queries on media resources. Only name-value pairs
>> that do not parse according to our ABNF will be ignored from the  
>> viewpoint
>> of media fragments. They can be used by the browser or server for other
>> purposes.
> Same opinion here, option 1 doesn't seem to make sense. However, should  
> we allow any unknown constructions in the URI fragment or
> just key-value pairs with an unknown key? For example:
> - t=10,500&freq=300,3000: should be a valid fragment IMO, as indicated  
> by Philip's arguments;
> - t=10,500&foo: is this a valid media fragment? According to Philip's  
> parsing algorithm, I think it is not. From an extension point
> of view, disallowing such a construction should be fine since we can  
> rewrite this as t=10,500&foo=true if we want to obtain
> key-value pairs. Note that I'm not in favor of allowing other things  
> than key-value pairs, I just wanted to point out this case.

The ABNF I suggested in  
isn't complete, it's just the first level defining name-value pairs. I  
think that we should define validity in a way that makes validators warn  
about things that aren't part of MF 1.0, to help authors find typos, etc.  
There are many ways we could achieve that spec-wise, if we agree on what  
we want. Validity and parsing can and should be separate, so we don't need  
to agree on exact details for the purposes of this discussion.

Philip Jägenstedt
Core Developer
Opera Software
Received on Friday, 24 September 2010 08:10:11 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:52:45 UTC