Re: ACTION-187: extensibility and parsing

On Fri, 24 Sep 2010 10:43:32 +0200, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> On Fri, Sep 24, 2010 at 6:09 PM, Philip Jägenstedt  
> <philipj@opera.com>wrote:
>
>> On Fri, 24 Sep 2010 06:56:33 +0200, Davy Van Deursen <
>> davy.vandeursen@ugent.be> wrote:
>>
>>  Citeren Silvia Pfeiffer <silviapfeiffer1@gmail.com>:
>>>
>>>> On Wed, Sep 22, 2010 at 9:12 PM, Philip Jägenstedt <philipj@opera.com
>>>> >wrote:
>>>>
>>>>  As request, a short summary of the long standing issue of syntax,
>>>>> parsing
>>>>> and how that relates to extensibility.
>>>>>
>>>>> By extensibility I am not primarily talking about 3rd parties  
>>>>> extending
>>>>> MF,
>>>>> but about our own possibilities of updating the spec after MF 1.0.  
>>>>> For
>>>>> the
>>>>> purpose of discussion, assume that we want to add a dimension for
>>>>> filtering
>>>>> the audio, e.g., freq=300,3000 to keep only the part of the audio  
>>>>> that
>>>>> corresponds (approximately) to human voice (300Hz-3000Hz).
>>>>>
>>>>> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ?  
>>>>> This
>>>>> is
>>>>> the core point of disagreement, and the question is really about how  
>>>>> MF
>>>>> 1.0
>>>>> parsers should work. Leaving it undefined is not a good option, as  
>>>>> the
>>>>> history clearly shows. Two other options have been on the table:
>>>>>
>>>>> 1. Require that parsing follow a strict ABNF syntax like the one we
>>>>> have.
>>>>> Since freq is not part of the MF 1.0 syntax, parsing
>>>>> t=10,500&freq=300,3000
>>>>> will fail and the whole fragment will be ignored, including t=10,500.
>>>>>
>>>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>>>> syntax. The concrete suggestion I've made is that the algorithm or
>>>>> syntax
>>>>> should match how query strings work. That is, a list or key-value  
>>>>> pairs
>>>>> is
>>>>> formed by splitting the string on & and =. As a second step, that  
>>>>> list
>>>>> is
>>>>> traversed to match the keys against the dimensions and parsed  
>>>>> according
>>>>> to
>>>>> the ABNF syntax of each dimension. Crucially, unrecognized/invalid  
>>>>> keys
>>>>> or
>>>>> values are ignored. That means that in the above example, the time
>>>>> dimension
>>>>> will keep working even if an unrecognized (to a MF 1.0  
>>>>> implementation)
>>>>> freq
>>>>> dimension is used.
>>>>>
>>>>> Note: Neither 1 or 2 are requirements on using any specific
>>>>> implementation
>>>>> technique, only to behave *as if* you are, which still leaves plenty  
>>>>> of
>>>>> room
>>>>> for different approaches.
>>>>>
>>>>> I strongly favor option number 2, and see these benefits:
>>>>>
>>>>> * It works like query strings, just like one would expect from  
>>>>> looking
>>>>> at
>>>>> the syntax. The algorithm I've suggested is actually from testing  
>>>>> query
>>>>> string parsing in PHP, ASP, ASP.NET, CGI.pl and JSP, as reported
>>>>> earlier
>>>>> on this list.
>>>>>
>>>>> * It's simpler for implementors, as we won't have to implement
>>>>> everything
>>>>> at once. This is likely what's going to happen, as the time  
>>>>> dimension is
>>>>> ready to implement, while the named dimension is still not clear how  
>>>>> to
>>>>> apply to e.g. a WebM or Ogg resource.
>>>>>
>>>>> * It's better for extensibility, as adding new dimensions doesn't  
>>>>> break
>>>>> all
>>>>> existing implementations. Imagine if adding a new element to HTML  
>>>>> would
>>>>> cause pages to render completely blank in all existing browsers. Not
>>>>> even
>>>>> XHTML is that strict.
>>>>>
>>>>> Please comment, we need to reach some kind of consensus on this soon  
>>>>> and
>>>>> move on. If we can agree on what we want, we can then discuss how to
>>>>> change
>>>>> the spec accordingly (algorithm or ABNF, etc...)
>>>>>
>>>>
>>>>
>>>>
>>>> I also strongly favor option number 2. I don't think anything else  
>>>> makes
>>>> sense, actually, because we would fail to interoperate with  other
>>>> schemes
>>>> that use fragments and queries on media resources. Only name-value  
>>>> pairs
>>>> that do not parse according to our ABNF will be ignored from the
>>>> viewpoint
>>>> of media fragments. They can be used by the browser or server for  
>>>> other
>>>> purposes.
>>>>
>>>
>>> Same opinion here, option 1 doesn't seem to make sense. However,  
>>> should we
>>> allow any unknown constructions in the URI fragment or
>>> just key-value pairs with an unknown key? For example:
>>> - t=10,500&freq=300,3000: should be a valid fragment IMO, as indicated  
>>> by
>>> Philip's arguments;
>>> - t=10,500&foo: is this a valid media fragment? According to Philip's
>>> parsing algorithm, I think it is not. From an extension point
>>> of view, disallowing such a construction should be fine since we can
>>> rewrite this as t=10,500&foo=true if we want to obtain
>>> key-value pairs. Note that I'm not in favor of allowing other things  
>>> than
>>> key-value pairs, I just wanted to point out this case.
>>>
>>
>> The ABNF I suggested in <
>> http://lists.w3.org/Archives/Public/public-media-fragment/2010Aug/0005.html>
>> isn't complete, it's just the first level defining name-value pairs. I  
>> think
>> that we should define validity in a way that makes validators warn about
>> things that aren't part of MF 1.0, to help authors find typos, etc.  
>> There
>> are many ways we could achieve that spec-wise, if we agree on what we  
>> want.
>> Validity and parsing can and should be separate, so we don't need to  
>> agree
>> on exact details for the purposes of this discussion.
>
>
> Assuming everyone is on board with that (which, of course, isn't clear  
> yet)
> - would you be able to come up with spec text for this? You seem to have  
> an
> idea in your head already what it should look like, so it would be good  
> to
> build on that.

Sure, I could write some spec text. As an FYI, These are the options I see:

1. Just use the ABNF we have now and let parsing be completely separate  
 from it.

2. Define a name-value syntax and say that parsers should use that to get  
name-value pairs (simple because it's equivalent to splitting on & and =).  
Then say that a valid Media Fragment is one where all the names and values  
match the dimensions and their corresponding syntax.

I'll not go further into discussion about these spec-writing details, as  
the purpose of this thread is to reach consensus on how parsing should  
work, and thus what kind of extensibility we get.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Friday, 24 September 2010 09:09:47 UTC