Re: Processing requirements from Silvia Pfeiffer on 2009-12-12 (public-media-fragment@w3.org from December 2009)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Sat, 12 Dec 2009 15:46:53 +1100
To: Philip Jägenstedt <philipj@opera.com>
Cc: Jack Jansen <Jack.Jansen@cwi.nl>, Media Fragment <public-media-fragment@w3.org>
Message-ID: <2c0e02830912112046w1275e92eu4342d8101167e99c@mail.gmail.com>
Hi all,

While in the process of editing the specification, I have made some
adjustments that go a fair way towards solving these issues. So, let
me contribute these here.

On Thu, Dec 3, 2009 at 10:01 AM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Wed, 02 Dec 2009 21:51:47 +0100, Jack Jansen <Jack.Jansen@cwi.nl> wrote:
>
>>
>> On 2 dec 2009, at 12:55, Philip Jägenstedt wrote:
>>
>>> Following up on my previous email and todays IRC-conference (for me).
>>>
>>> I won't get involved in the editors stylistic choices between ABNF,
>>> equivalent parsing algorithms (only the side effects of which are normative)
>>> or any other spec technique, but would request that at least the following
>>> are defined:
>>>
>>> 1. Splitting of name-value pairs
>>>
>>> The current ABNF only allows joining timesegment / spacesegment /
>>> tracksegment by "&", which means that e.g. #t=5& is not allowed because it
>>> has a trailing &, which is very easy to get by accident if you write a
>>> script like this:
>>>
>>> urifrag = '#':
>>> for d in dimensions:
>>>   urifrag += d + '&'
>>
>> I'm not thrilled by this idea. The web has a long history of features
>> where an initial implementation was syntactically forgiving because it was
>> deemed to be user-friendly at the time. Many of these have been causing
>> endless headaches until today. Think of the ability to use filenames
>> (especially Windows filenames) in the URL-bar, or in attributes in the HTML
>> code. Think of global variables in JavaScript.
>
> Let's be clear that validity and processing requirements are separate
> things. That the processing for a certain input is well defined does not
> mean that said input is valid. The validity definition is useful for authors
> to check their syntax against (using a validator) to find some mistakes,
> etc. In my opinion, processing requirements should be as strict as possible
> (staying close to the valid syntax) while still being easy to understand
> (for test suite writers, implementors and actual authors) and degrading
> gracefully for forward-compatibility in the contexts where it is necessary.
>
> I am not suggesting relaxing e.g. any of the temporal syntaxes because there
> is no benefit in doing so -- they are fixed and will not be changed by
> future spec revisions.
>
> The Web platform is full of ugly and broken features, but that is not
> because specs had unambiguous but lax processing requirements, it is because
> they either did not exist or left processing ambiguous or undefined. This
> results in poor interoperability and an inevitable race towards the most
> forgiving parsing possible. We absolutely do not want this to happen yet
> again with media fragments.

I have added two paragraphs to the the ABNF specification section, see
http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-syntax,
which specifies how we look at media fragment URIs. I think this is
necessary. I have kept it slightly more generic than just specifying
"&" as a separator and also allowed ";" as a separator, since that is
being used often by applications as a separator (see
http://en.wikipedia.org/wiki/Query_string). I think that's a good
compromise to take to address Philip's concern.


> By the way, is anyone developing a MF validator? One could surely be written
> in JavaScript quickly.

No, not yet, but please go ahead and do so! It would be awesome to
have that. I have, of course, as part of my demo at
http://www.annodex.net/~silvia/itext/mediafrag.html implemented a
quick and dirty parser, but it's in no way shape or form complete.


>>> This specific case *can* be fixed in the ABNF, but leads into the next
>>> issue:
>>>
>>> 2. Handling of unrecognized syntax
>>>
>>> This means that #u=12&t=5 can still proceed to getting the time offset 5.
>>> Not allowing this makes it impossible to extend MF in the future as any new
>>> syntax is invalid per the current spec.
>>>
>>> As a necessary (but unsightly) side-effect, anything between & that isn't
>>> recognized should be ignored, including the empty string. Thus a conforming
>>> UA should be able to handle this extreme:
>>>
>>> #&&=&=tom&jerry=&t=34&t=meow:0# (time offset 34 seconds)
>>
>> This is a very difficult issue, we already touched on it in the last
>> teleconf. The problem is that there are two types of future extensions, and
>> they need opposite solutions. Some future attributes should preferably be
>> ignored by older implementations, think of a hypothetical
>> "preferred-languages=english-french-german" attribute. Other future
>> attributes should lead to an error if the older implementation doesn't
>> understand the attribute, think of "rating=pg" (which would return only
>> tracks with a rating of G or PG, supposedly).
>>
>> But: I have an idea that may be a solution to this, loosely based on the
>> SMIL skip-content attribute
>> (http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-content.html#adef-skip-content).
>> If we add an attribute that tells older implementations what to do (ignore
>> unknown attributes, or raise an error) we could have our cake and eat it.
>> The first example would then usually be coded as
>> "....&preferred-languages=english-french-german&unknown=ignore", the second
>> as "....&rating=pg&unknown=error". The only remaining question is now: what
>> is the default value for the unknown attribute.
>>
>> What do y'all think? Would this fly?
>
> Adding processing instructions on the same level as the actual syntax
> strikes me as very odd, but is technically possible.
>
> Defaulting to unknown=error would be a bad idea. When an author tests their
> syntax in a UA that does understand "rating=pg", unknown=ignore has no
> effect so they will not use it (and validators won't complain because the
> new syntax is valid per the new spec). A good portion of authors write by
> trial and error, so at this point they think they are done. However, all old
> UAs are now required to fail. They get angry bug reports from their users,
> while users of UAs which ignored the spec are still happy.
>
> Defaulting to unknown=ignore and honoring unknown=error would be possible,
> but is still a worse behavior than if the UA can use all of the components
> it *does* understand. The rare case of mandatory failure must, logically, be
> handled outside of MF because UAs which don't understand MF at all (e.g. all
> web browsers ever shipped to date) would otherwise bypass it.

I agree with Philip and would not really want to add processing
instructions into the URI fragment or query string.

I think what I described above already addresses the issues that
Philip brought up. But I may have missed something, so please check
and let me know.


>>> 3. Processing order
>>>
>>> As an example, what is the result of processing #t=5&t=10 ? I think the
>>> result should be 10, because it is what you would usually implement by
>>> mistake if not making a conscious choice.
>>>
>>> The other option is that duplicating any dimension should cause the
>>> entire fragment to be ignored, which I do not support.
>>
>> This is somewhat similar to the first case, but much more serious.
>> Personally, I am heavily opposed to letting over-specified do anything but
>> return a hard error. If the URL was generated by a program this means the
>> program is buggy, if it was done by a human, similarly, the person should be
>> taught to mend their ways. Guessing that "the last one is probably what was
>> meant" is a random choice. Actually, I would argue that if it was a human
>> who created this specific URL the "right thing" to do is probably to start
>> at second 15. (I send you a fragment starting at second 5. You don't like
>> the first 10 seconds of that clip, so before you forward it to another
>> friend you tack a "&t=10" to the end).
>
> The more important case to considered is #t=npt:5&t=foo:12

According to the syntax that we are standardising, the second
field-name parameter is invalid, so t=npt:5 dominates IMO.

> When new temporal syntax foo arrives in MF 2.0, there will be both UAs
> supporting MF 1.0 and those supporting MF 2.0 in existence for a very long
> time. In that very long time, it should be possible to use both syntaxes and
> have MF 1.0 UAs simply fall back to the one they understand which
> approximates the new foo. Degrade gracefully! This is best achieved by
> having the UA use the last fragment it recognizes, which is also very simple
> for authors to understand and work with.

Yes, I think this makes sense and that's also what I have added into
the specification. Check out the newly created section
http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#processing-overview-errors
- it contains a start at the list of errors that a system may
encounter and proposes what to do in those cases. I have specified
that for over-specified dimensions the last occurrence is being used.
This is indeed the opposite of what we previously proposed, but I do
agree with Philip here.


> On #t=5&t=10, I'll note that the spec currently *allows* overspecificaton.
> However, I agree with you that it should be invalid, so that validators can
> warn authors about their mistake. The processing rules should however should
> tolerate it because a parser which rejects it is much more complex for no
> real gain, resulting in more work and more bugs.

Should it be invalid instead of using the last occurrence? I prefer to
do something that makes sense rather then putting the specification
screws on too tightly for programs and users.


Let me know if I have missed anything.

Regards,
Silvia.
Received on Saturday, 12 December 2009 04:47:46 UTC