Re: Spec layering: name-value pairs and beyond from Philip Jägenstedt on 2010-03-10 (public-media-fragment@w3.org from March 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Wed, 10 Mar 2010 10:33:24 +0800
To: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>
Cc: "Media Fragment" <public-media-fragment@w3.org>
Message-ID: <op.u9bypyh8atwj1d@philip-pc.oslo.opera.com>
On Wed, 10 Mar 2010 04:20:10 +0800, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> On Wed, Mar 10, 2010 at 1:57 AM, Philip Jägenstedt <philipj@opera.com>  
> wrote:
>> On Tue, 09 Mar 2010 18:27:27 +0800, Philip Jägenstedt  
>> <philipj@opera.com>
>> wrote:
>>
>>> On Tue, 09 Mar 2010 16:18:03 +0800, Philip Jägenstedt  
>>> <philipj@opera.com>
>>> wrote:
>>>
>>>> I will commit this to CVS for further editing unless
>>>> there are objections during the day.
>>>
>>> I have already committed this. Note that I did *not* update the section
>>> "Collected ABNF Syntax", that should really be automatically generated
>>> anyway.
>>
>> I changed the name of the production from mediafragment to namevalues  
>> for
>> clarity. My intention is to use the ABNF production rules to rewrite the
>> name-value list processing algorithm to something like this.
>>
>> 1. for each non-overlapping substring in input that is a valid  
>> production of
>> the namevalue syntax:
>> 1.1. let pct-name be the substring matching the name production.
>> 1.2. let pct-value be the substring matching the value production.
>>
>> The rest (percent-decoding and UTF-8 decoding) would still be the same,  
>> but
>> I'm happy to rely on ABNF to avoid having to define what string  
>> splitting
>> means, etc. If there is a declarative language (ABNF or otherwise) that  
>> can
>> express that percent-decoding and and UTF-8 decoding be performed, I'd  
>> be
>> happy to use that instead.
>>
>> Note: By definition the input is a valid production of namevalues, if  
>> the
>> input is from the fragment or query component of a URI.
>
> I believe Yves looks at the production rules as generic rules how to
> create such a URI rather than how to parse such a URI.
>
> I think we may need to make two different sections for these two
> approaches, since when you are creating a URI, you have to UTF-8
> encode and percent-encode etc, while when you are parsing, you do the
> opposite (you decode and you would probably not decode UTF-8, and
> possibly not decode percent-encoding before doing something with it,
> such as sticking the values in a HTTP header.

The spec needs to be extremely clear about two things. For every possible  
input, it should answer two questions:

1. Is it valid syntax? This is needed by validators. Hopefully it can be  
defined in terms of ABNF, although the constraint that it be valid UTF-8  
isn't captured anywhere.

2. How to process it, i.e. what is the semantics? This is needed by  
implementors. In certain situations it's necessary to gracefully handle  
things which are syntactically invalid, in which case the ABNF  
unfortunately cannot be reused.

Furthermore, it needs to answer it on two level: the URI level and the  
name-value list level.

URI level syntax: all we have is a fragment component or query component  
in which we have encoded a name-value list with arbitrary names and  
values. The spec already addresses validity [1]. Unfortunately *any* valid  
URI is valid per this definition and the resulting name-value lists don't  
have any interesting semantics, so obviously this is not the only syntax  
needed.

URI level processing: defined in [2].

Name-value list level syntax: defined by the pairs of fooprefix and  
fooparam production rules.

However, there's important glue missing here. There's nothing that  
actually links together fooprefix with fooparam or says how many instances  
of each syntax is allowed, etc. What we need to do is add validity  
constraints on the name-value list level.

For example:

  * If there is a name 'id' then it must be the only member of the list and  
the value must not be an empty string.

  * If there is a name 't' then the value must be a valid production of the  
timeparam syntax. Further, it would be acceptable for MF 1.0 to disallow  
more than one instance of 't', with the expectation that MF 2.0 will  
change this to allow backwards compatibility.

This is a bit verbose, but is there any other way?

Name-value list level processing: An attempt at this is already in [3].  
However, I'm not very happy with it and don't think it helps a lot. The  
semantics are almost completely missing, saying semi-nonsense things like  
"let the temporal dimension of dimensions be the time range represented by  
value" without it being defined anywhere what "represented" means (it  
should be defined for each syntax).

I hope this makes it clearer what the layering I am talking about actually  
means.

[1]  
http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#namevalues-prod
[2]  
http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#processing-name-value-components
[3]  
http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#processing-name-value-lists

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Wednesday, 10 March 2010 02:34:10 UTC