Re: ACTION-187: extensibility and parsing from Silvia Pfeiffer on 2010-10-20 (public-media-fragment@w3.org from October 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 21 Oct 2010 08:13:34 +1100
To: Philip Jägenstedt <philipj@opera.com>
Cc: public-media-fragment@w3.org
Message-ID: <AANLkTi=dgY2mOB2iGYTzLPXgo+EphqBCsdMDnWUc7Br9@mail.gmail.com>
On Thu, Oct 21, 2010 at 1:32 AM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Tue, 19 Oct 2010 23:13:22 +0200, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>
>> On Mon, Oct 18, 2010 at 7:31 PM, Philip Jägenstedt <philipj@opera.com>
>> wrote:
>>>
>>> On Wed, 29 Sep 2010 15:50:35 +0200, Raphaël Troncy
>>> <raphael.troncy@eurecom.fr> wrote:
>>>
>>>> Dear Philip,
>>>>
>>>>> As request, a short summary of the long standing issue of syntax,
>>>>> parsing and how that relates to extensibility.
>>>>
>>>> Thanks for having start up this thread. We have closed today the
>>>> action-187 (and 186) as you may have seen in the minutes. We have also
>>>> resolved that we should do the option 2 you described below (consensus
>>>> from
>>>> all minus one neutral among the people who have expressed an opinion).
>>>>
>>>> We have further precise how the specification will manage extensibility:
>>>>
>>>> 1/ A media fragment URI is indeed a set of key/value pairs for which at
>>>> least one key is recognized by our grammar
>>>> 2/ The ABNF grammar that describes the media fragment syntax will be
>>>> edited (see ACTION-189) so that:
>>>>  . The production rule of 'mediasegment' is now:
>>>> mediasegment = namesegment / axissegment / extensionsegment
>>>> extensionsegment = extensionprefix '=' extensionparam
>>>>  . Additional prose states that 'extensionsegment' cannot redefine one
>>>> of
>>>> the current axis, so e.g., extensionprefix cannot be 't' or 'track' or
>>>> 'id'
>>>> or 'xywh'
>>>> 3/ We could add an additional paragraph stating how the parsing of the
>>>> media fragment URI should be done
>>>>
>>>>> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ? This
>>>>> is the core point of disagreement, and the question is really about how
>>>>> MF 1.0 parsers should work. Leaving it undefined is not a good option,
>>>>> as the history clearly shows.
>>>>
>>>> Indeed. With the current decision:
>>>>  . <uri>#t=10,500&freq=300,3000 will be a valid MF 1.0 URI
>>>>  . <uri>#freq=300,3000 will *NOT* be a valid MF 1.0 URI
>>>>
>>>>> 1. Require that parsing follow a strict ABNF syntax like the one we
>>>>> have. Since freq is not part of the MF 1.0 syntax, parsing
>>>>> t=10,500&freq=300,3000 will fail and the whole fragment will be
>>>>> ignored,
>>>>> including t=10,500.
>>>>>
>>>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>>>> syntax. The concrete suggestion I've made is that the algorithm or
>>>>> syntax should match how query strings work. That is, a list or
>>>>> key-value
>>>>> pairs is formed by splitting the string on & and =. As a second step,
>>>>> that list is traversed to match the keys against the dimensions and
>>>>> parsed according to the ABNF syntax of each dimension. Crucially,
>>>>> unrecognized/invalid keys or values are ignored. That means that in the
>>>>> above example, the time dimension will keep working even if an
>>>>> unrecognized (to a MF 1.0 implementation) freq dimension is used.
>>>>
>>>> Please comment on this decision stating either that you agree or you
>>>> disagree so that I can implement the ACTION-189.
>>>> Thanks.
>>>> Best regards.
>>>>
>>>>  Raphaël
>>>>
>>>
>>> Sorry for the delay, much traveling and an overflowing inbox takes its
>>> toll.
>>>
>>> I take it that the option 2 you refer to is this one:
>>>
>>> On Wed, 22 Sep 2010 13:12:14 +0200, Philip Jägenstedt <philipj@opera.com>
>>> wrote:
>>>
>>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>>> syntax. The concrete suggestion I've made is that the algorithm or
>>>> syntax
>>>> should match how query strings work. That is, a list or key-value pairs
>>>> is
>>>> formed by splitting the string on & and =. As a second step, that list
>>>> is
>>>> traversed to match the keys against the dimensions and parsed according
>>>> to
>>>> the ABNF syntax of each dimension. Crucially, unrecognized/invalid keys
>>>> or
>>>> values are ignored. That means that in the above example, the time
>>>> dimension
>>>> will keep working even if an unrecognized (to a MF 1.0 implementation)
>>>> freq
>>>> dimension is used.
>>>
>>> I'm glad we've finally been able to agree on this, the question is now
>>> only
>>> how to put it in the spec. The parsing must have the following properties
>>> for this to be like query strings:
>>>
>>> * Percent-decoding must performed on both names and values. I don't think
>>> this can be expressed in a single layer of ABNF, and at the very least it
>>> isn't expressed in the ABNF we currently have.
>>>
>>> * When a name occurs twice, the last occurrence should be the one used.
>>> (#t=0&t=1 means 1 second)
>>>
>>> About the suggested extension of mediasegment above, it's problematic to
>>> require that extensionprefix not be 't', 'track', 'id' or 'xywh'. That
>>> would
>>> mean that #t=foo:1&t=1 wouldn't parse to 1 second, making it impossible
>>> to
>>> ever add additional time formats. Always ignoring invalid things is
>>> simpler.
>>>
>>> How should we move forwards with the spec editing practicalities. IMO,
>>> validity and parsing are sufficiently different that they can't simply be
>>> merged into a single ABNF. A validator should warn against all unknown
>>> name-values, while a user agent should ignore them.
>>>
>>> My suggestion:
>>>
>>> Introduce the ABNF in
>>>
>>> <http://lists.w3.org/Archives/Public/public-media-fragment/2010Aug/0005.html>
>>>
>>> By saying that the name and value are URI components, I believe it is
>>> implied that percent-decoding should be performed. In either case, the
>>> spec
>>> should say so for clarity.
>>>
>>> The name-value byte arrays should be decoded as UTF-8 to give unicode
>>> strings.
>>>
>>> Have a pair of ABNF for each of our dimensions that operate on these
>>> unicode
>>> strings.
>>>
>>> A valid MF is one where all name-value pairs match one of the predefined
>>> dimensions, but no name occurs twice.
>>>
>>> For parsing, one iterates over the list of name-value pairs, parsing any
>>> that are valid according to the ABNF. As a side-effect of the loop, the
>>> last
>>> valid pair of any dimension is the one that ends up being used.
>>>
>>
>> Sounds like no-body objects. Is this being included in the spec now?
>
> At today's teleconference I was under the impression that this is what
> ACTION-189 [1] was about, so we didn't discuss it a lot. However, when
> reading the minutes I noticed that's actually an action to "put the
> top-level production rules back into the document", which isn't quite the
> same.
>
> What are the concrete next steps to resolving ISSUE-19?
>
> [1] http://www.w3.org/2008/WebVideo/Fragments/tracker/actions/189
> [2] http://www.w3.org/2008/WebVideo/Fragments/tracker/issues/19


If I understood the call correctly, ACTION-189 was a start for this,
but not the full thing. I was also under the impression that Raphael
said he'd do that yesterday, but I didn't see any new commits to the
spec.

If you are keen, I don't think you should hold back. We're about
getting things done here and if you are keen to do it, I'm sure nobody
will object now that we all agree.

Cheers,
Silvia.
Received on Wednesday, 20 October 2010 21:14:23 UTC