Re: ACTION-187: extensibility and parsing from Philip Jägenstedt on 2010-10-20 (public-media-fragment@w3.org from October 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Wed, 20 Oct 2010 16:32:01 +0200
To: public-media-fragment@w3.org
Message-ID: <op.vkvpbnkosr6mfa@kirk>
On Tue, 19 Oct 2010 23:13:22 +0200, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> On Mon, Oct 18, 2010 at 7:31 PM, Philip Jägenstedt <philipj@opera.com>  
> wrote:
>> On Wed, 29 Sep 2010 15:50:35 +0200, Raphaël Troncy
>> <raphael.troncy@eurecom.fr> wrote:
>>
>>> Dear Philip,
>>>
>>>> As request, a short summary of the long standing issue of syntax,
>>>> parsing and how that relates to extensibility.
>>>
>>> Thanks for having start up this thread. We have closed today the
>>> action-187 (and 186) as you may have seen in the minutes. We have also
>>> resolved that we should do the option 2 you described below (consensus  
>>> from
>>> all minus one neutral among the people who have expressed an opinion).
>>>
>>> We have further precise how the specification will manage  
>>> extensibility:
>>>
>>> 1/ A media fragment URI is indeed a set of key/value pairs for which at
>>> least one key is recognized by our grammar
>>> 2/ The ABNF grammar that describes the media fragment syntax will be
>>> edited (see ACTION-189) so that:
>>>   . The production rule of 'mediasegment' is now:
>>> mediasegment = namesegment / axissegment / extensionsegment
>>> extensionsegment = extensionprefix '=' extensionparam
>>>   . Additional prose states that 'extensionsegment' cannot redefine  
>>> one of
>>> the current axis, so e.g., extensionprefix cannot be 't' or 'track' or  
>>> 'id'
>>> or 'xywh'
>>> 3/ We could add an additional paragraph stating how the parsing of the
>>> media fragment URI should be done
>>>
>>>> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ?  
>>>> This
>>>> is the core point of disagreement, and the question is really about  
>>>> how
>>>> MF 1.0 parsers should work. Leaving it undefined is not a good option,
>>>> as the history clearly shows.
>>>
>>> Indeed. With the current decision:
>>>  . <uri>#t=10,500&freq=300,3000 will be a valid MF 1.0 URI
>>>  . <uri>#freq=300,3000 will *NOT* be a valid MF 1.0 URI
>>>
>>>> 1. Require that parsing follow a strict ABNF syntax like the one we
>>>> have. Since freq is not part of the MF 1.0 syntax, parsing
>>>> t=10,500&freq=300,3000 will fail and the whole fragment will be  
>>>> ignored,
>>>> including t=10,500.
>>>>
>>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>>> syntax. The concrete suggestion I've made is that the algorithm or
>>>> syntax should match how query strings work. That is, a list or  
>>>> key-value
>>>> pairs is formed by splitting the string on & and =. As a second step,
>>>> that list is traversed to match the keys against the dimensions and
>>>> parsed according to the ABNF syntax of each dimension. Crucially,
>>>> unrecognized/invalid keys or values are ignored. That means that in  
>>>> the
>>>> above example, the time dimension will keep working even if an
>>>> unrecognized (to a MF 1.0 implementation) freq dimension is used.
>>>
>>> Please comment on this decision stating either that you agree or you
>>> disagree so that I can implement the ACTION-189.
>>> Thanks.
>>> Best regards.
>>>
>>>   Raphaël
>>>
>>
>> Sorry for the delay, much traveling and an overflowing inbox takes its  
>> toll.
>>
>> I take it that the option 2 you refer to is this one:
>>
>> On Wed, 22 Sep 2010 13:12:14 +0200, Philip Jägenstedt  
>> <philipj@opera.com>
>> wrote:
>>
>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>> syntax. The concrete suggestion I've made is that the algorithm or  
>>> syntax
>>> should match how query strings work. That is, a list or key-value  
>>> pairs is
>>> formed by splitting the string on & and =. As a second step, that list  
>>> is
>>> traversed to match the keys against the dimensions and parsed  
>>> according to
>>> the ABNF syntax of each dimension. Crucially, unrecognized/invalid  
>>> keys or
>>> values are ignored. That means that in the above example, the time  
>>> dimension
>>> will keep working even if an unrecognized (to a MF 1.0 implementation)  
>>> freq
>>> dimension is used.
>>
>> I'm glad we've finally been able to agree on this, the question is now  
>> only
>> how to put it in the spec. The parsing must have the following  
>> properties
>> for this to be like query strings:
>>
>> * Percent-decoding must performed on both names and values. I don't  
>> think
>> this can be expressed in a single layer of ABNF, and at the very least  
>> it
>> isn't expressed in the ABNF we currently have.
>>
>> * When a name occurs twice, the last occurrence should be the one used.
>> (#t=0&t=1 means 1 second)
>>
>> About the suggested extension of mediasegment above, it's problematic to
>> require that extensionprefix not be 't', 'track', 'id' or 'xywh'. That  
>> would
>> mean that #t=foo:1&t=1 wouldn't parse to 1 second, making it impossible  
>> to
>> ever add additional time formats. Always ignoring invalid things is  
>> simpler.
>>
>> How should we move forwards with the spec editing practicalities. IMO,
>> validity and parsing are sufficiently different that they can't simply  
>> be
>> merged into a single ABNF. A validator should warn against all unknown
>> name-values, while a user agent should ignore them.
>>
>> My suggestion:
>>
>> Introduce the ABNF in
>> <http://lists.w3.org/Archives/Public/public-media-fragment/2010Aug/0005.html>
>>
>> By saying that the name and value are URI components, I believe it is
>> implied that percent-decoding should be performed. In either case, the  
>> spec
>> should say so for clarity.
>>
>> The name-value byte arrays should be decoded as UTF-8 to give unicode
>> strings.
>>
>> Have a pair of ABNF for each of our dimensions that operate on these  
>> unicode
>> strings.
>>
>> A valid MF is one where all name-value pairs match one of the predefined
>> dimensions, but no name occurs twice.
>>
>> For parsing, one iterates over the list of name-value pairs, parsing any
>> that are valid according to the ABNF. As a side-effect of the loop, the  
>> last
>> valid pair of any dimension is the one that ends up being used.
>>
>
> Sounds like no-body objects. Is this being included in the spec now?

At today's teleconference I was under the impression that this is what  
ACTION-189 [1] was about, so we didn't discuss it a lot. However, when  
reading the minutes I noticed that's actually an action to "put the  
top-level production rules back into the document", which isn't quite the  
same.

What are the concrete next steps to resolving ISSUE-19?

[1] http://www.w3.org/2008/WebVideo/Fragments/tracker/actions/189
[2] http://www.w3.org/2008/WebVideo/Fragments/tracker/issues/19

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Wednesday, 20 October 2010 14:32:40 UTC