Re: Processing requirements from Silvia Pfeiffer on 2009-12-23 (public-media-fragment@w3.org from December 2009)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 23 Dec 2009 23:43:02 +1100
To: Philip Jägenstedt <philipj@opera.com>
Cc: Jack Jansen <Jack.Jansen@cwi.nl>, Media Fragment <public-media-fragment@w3.org>
Message-ID: <2c0e02830912230443r211d64b6hd71cf6cfe1485c72@mail.gmail.com>
Hi Philip,

Thanks for continuing to give implementer/browser developer feedback -
it's really awesome to have this input! And such a shame you cannot
join Davy, Conrad and I at FOMS to discuss this further.


On Wed, Dec 23, 2009 at 11:28 PM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Sat, 12 Dec 2009 05:46:53 +0100, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>> On Thu, Dec 3, 2009 at 10:01 AM, Philip Jägenstedt <philipj@opera.com>
>> wrote:
>>>
>>> On Wed, 02 Dec 2009 21:51:47 +0100, Jack Jansen <Jack.Jansen@cwi.nl>
>>> wrote:
>>>
>>>>
>>>> On 2 dec 2009, at 12:55, Philip Jägenstedt wrote:
>>>>
>>>>> Following up on my previous email and todays IRC-conference (for me).
>>>>>
>>>>> I won't get involved in the editors stylistic choices between ABNF,
>>>>> equivalent parsing algorithms (only the side effects of which are
>>>>> normative)
>>>>> or any other spec technique, but would request that at least the
>>>>> following
>>>>> are defined:
>>>>>
>>>>> 1. Splitting of name-value pairs
>>>>>
>>>>> The current ABNF only allows joining timesegment / spacesegment /
>>>>> tracksegment by "&", which means that e.g. #t=5& is not allowed because
>>>>> it
>>>>> has a trailing &, which is very easy to get by accident if you write a
>>>>> script like this:
>>>>>
>>>>> urifrag = '#':
>>>>> for d in dimensions:
>>>>>  urifrag += d + '&'
>>>>
>>>> I'm not thrilled by this idea. The web has a long history of features
>>>> where an initial implementation was syntactically forgiving because it
>>>> was
>>>> deemed to be user-friendly at the time. Many of these have been causing
>>>> endless headaches until today. Think of the ability to use filenames
>>>> (especially Windows filenames) in the URL-bar, or in attributes in the
>>>> HTML
>>>> code. Think of global variables in JavaScript.
>>>
>>> Let's be clear that validity and processing requirements are separate
>>> things. That the processing for a certain input is well defined does not
>>> mean that said input is valid. The validity definition is useful for
>>> authors
>>> to check their syntax against (using a validator) to find some mistakes,
>>> etc. In my opinion, processing requirements should be as strict as
>>> possible
>>> (staying close to the valid syntax) while still being easy to understand
>>> (for test suite writers, implementors and actual authors) and degrading
>>> gracefully for forward-compatibility in the contexts where it is
>>> necessary.
>>>
>>> I am not suggesting relaxing e.g. any of the temporal syntaxes because
>>> there
>>> is no benefit in doing so -- they are fixed and will not be changed by
>>> future spec revisions.
>>>
>>> The Web platform is full of ugly and broken features, but that is not
>>> because specs had unambiguous but lax processing requirements, it is
>>> because
>>> they either did not exist or left processing ambiguous or undefined. This
>>> results in poor interoperability and an inevitable race towards the most
>>> forgiving parsing possible. We absolutely do not want this to happen yet
>>> again with media fragments.
>>
>> I have added two paragraphs to the the ABNF specification section, see
>>
>> http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-syntax,
>> which specifies how we look at media fragment URIs. I think this is
>> necessary. I have kept it slightly more generic than just specifying
>> "&" as a separator and also allowed ";" as a separator, since that is
>> being used often by applications as a separator (see
>> http://en.wikipedia.org/wiki/Query_string). I think that's a good
>> compromise to take to address Philip's concern.
>
> I am a bit skeptical of allowing both & and ; as separators, as it adds a
> little bit of complexity without any obvious benefit. Wikipedia links back
> to
> <http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2>,
> but has that advice been followed? Personally I can't remember ever reading
> or writing a query string using ; as a separator. I'd like to see some
> research on actual deployed software to see if not allowing ; as a separator
> would cause problems. Also, if ; stays the ABNF needs to be updated.
>
> I appreciate this part: "A conformant server or user agent will need to be
> able to parse a random URI query or fragment string for a media resource and
> identify the relevant parts. E.g. the relevant field-value pair out of a
> media fragment URI like this
> http://www.example.com/video.ogv#&&=&=tom;jerry=&t=34&t=meow:0# is t=34."
>
> However, it's vague on what exactly the conformance requirements are. I'd
> like the spec to be explicit about how to split the fragment into segments,
> especially if & and ; are both allowed as separators. Having defined that,
> simply refer to the ABNF and say that any string which is not "a valid
> production of the mediasegment syntax" should be discarded. I suppose the
> error handling section is an appropriate place, although it is currently
> defined in terms of MF concepts and not strings. Perhaps the error handling
> section should be split into two parts, one which gets us from an arbitrary
> string to a list of dimensions, and then the existing section that defines
> which of those dimensions actually apply.

It is actually deliberately vague, because we are piggybacking onto a
mechanism that has been developed outside the media fragments working
group and is not part of what we should be specifying: how to compose
a query string. We are already leaning out of the window by also
applying it to URI fragments, but believe that is acceptable.

I do not think that the media fragment URI specification is the place
to define how a query string on a media resource has to be parsed.
There could be any number of other query parameters used in a query
string and they could be perfectly valid because the particular client
and the server both support them. So, we cannot actually write an
algorithm that expresses all possible query parameters in a media
fragment URI. We can only hint at it saying then where "?" or ";" are
being used as separators on a media resource and the particular
parameters that we specify are in use, we can prescribe what they
mean.

BTW: for the same reasoning, we cannot exclude ";" as a separator - if
for years it has been proposed to be used as separator, then that's
what it should be. I believe, however, that ";" is not a separator
between parameters, but probably rather between parameter values and
we can totally make use of that.


<..>
>
>>> On #t=5&t=10, I'll note that the spec currently *allows*
>>> overspecificaton.
>>> However, I agree with you that it should be invalid, so that validators
>>> can
>>> warn authors about their mistake. The processing rules should however
>>> should
>>> tolerate it because a parser which rejects it is much more complex for no
>>> real gain, resulting in more work and more bugs.
>>
>> Should it be invalid instead of using the last occurrence? I prefer to
>> do something that makes sense rather then putting the specification
>> screws on too tightly for programs and users.
>
> It should be invalid syntax (so that validators will warn authors), but the
> error handling section should tell implementations to use the last
> occurrence.

I'd be very happy to make it an invalid syntax. But if we allow
browsers/servers to deal with it, it becomes legal very quickly. So,
we should probably always return an error, such that authors will not
start creating faulty URLs and URL parsers.


Regards,
Silvia.
Received on Wednesday, 23 December 2009 12:43:56 UTC