Re: Processing requirements

On Wed, 02 Dec 2009 21:51:47 +0100, Jack Jansen <Jack.Jansen@cwi.nl> wrote:

>
> On 2 dec 2009, at 12:55, Philip Jägenstedt wrote:
>
>> Following up on my previous email and todays IRC-conference (for me).
>>
>> I won't get involved in the editors stylistic choices between ABNF,  
>> equivalent parsing algorithms (only the side effects of which are  
>> normative) or any other spec technique, but would request that at least  
>> the following are defined:
>>
>> 1. Splitting of name-value pairs
>>
>> The current ABNF only allows joining timesegment / spacesegment /  
>> tracksegment by "&", which means that e.g. #t=5& is not allowed because  
>> it has a trailing &, which is very easy to get by accident if you write  
>> a script like this:
>>
>> urifrag = '#':
>> for d in dimensions:
>>    urifrag += d + '&'
>
> I'm not thrilled by this idea. The web has a long history of features  
> where an initial implementation was syntactically forgiving because it  
> was deemed to be user-friendly at the time. Many of these have been  
> causing endless headaches until today. Think of the ability to use  
> filenames (especially Windows filenames) in the URL-bar, or in  
> attributes in the HTML code. Think of global variables in JavaScript.

Let's be clear that validity and processing requirements are separate  
things. That the processing for a certain input is well defined does not  
mean that said input is valid. The validity definition is useful for  
authors to check their syntax against (using a validator) to find some  
mistakes, etc. In my opinion, processing requirements should be as strict  
as possible (staying close to the valid syntax) while still being easy to  
understand (for test suite writers, implementors and actual authors) and  
degrading gracefully for forward-compatibility in the contexts where it is  
necessary.

I am not suggesting relaxing e.g. any of the temporal syntaxes because  
there is no benefit in doing so -- they are fixed and will not be changed  
by future spec revisions.

The Web platform is full of ugly and broken features, but that is not  
because specs had unambiguous but lax processing requirements, it is  
because they either did not exist or left processing ambiguous or  
undefined. This results in poor interoperability and an inevitable race  
towards the most forgiving parsing possible. We absolutely do not want  
this to happen yet again with media fragments.

By the way, is anyone developing a MF validator? One could surely be  
written in JavaScript quickly.

>> This specific case *can* be fixed in the ABNF, but leads into the next  
>> issue:
>>
>> 2. Handling of unrecognized syntax
>>
>> This means that #u=12&t=5 can still proceed to getting the time offset  
>> 5. Not allowing this makes it impossible to extend MF in the future as  
>> any new syntax is invalid per the current spec.
>>
>> As a necessary (but unsightly) side-effect, anything between & that  
>> isn't recognized should be ignored, including the empty string. Thus a  
>> conforming UA should be able to handle this extreme:
>>
>> #&&=&=tom&jerry=&t=34&t=meow:0# (time offset 34 seconds)
>
> This is a very difficult issue, we already touched on it in the last  
> teleconf. The problem is that there are two types of future extensions,  
> and they need opposite solutions. Some future attributes should  
> preferably be ignored by older implementations, think of a hypothetical  
> "preferred-languages=english-french-german" attribute. Other future  
> attributes should lead to an error if the older implementation doesn't  
> understand the attribute, think of "rating=pg" (which would return only  
> tracks with a rating of G or PG, supposedly).
>
> But: I have an idea that may be a solution to this, loosely based on the  
> SMIL skip-content attribute  
> (http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-content.html#adef-skip-content).  
> If we add an attribute that tells older implementations what to do  
> (ignore unknown attributes, or raise an error) we could have our cake  
> and eat it. The first example would then usually be coded as  
> "....&preferred-languages=english-french-german&unknown=ignore", the  
> second as "....&rating=pg&unknown=error". The only remaining question is  
> now: what is the default value for the unknown attribute.
>
> What do y'all think? Would this fly?

Adding processing instructions on the same level as the actual syntax  
strikes me as very odd, but is technically possible.

Defaulting to unknown=error would be a bad idea. When an author tests  
their syntax in a UA that does understand "rating=pg", unknown=ignore has  
no effect so they will not use it (and validators won't complain because  
the new syntax is valid per the new spec). A good portion of authors write  
by trial and error, so at this point they think they are done. However,  
all old UAs are now required to fail. They get angry bug reports from  
their users, while users of UAs which ignored the spec are still happy.

Defaulting to unknown=ignore and honoring unknown=error would be possible,  
but is still a worse behavior than if the UA can use all of the components  
it *does* understand. The rare case of mandatory failure must, logically,  
be handled outside of MF because UAs which don't understand MF at all  
(e.g. all web browsers ever shipped to date) would otherwise bypass it.

>> 3. Processing order
>>
>> As an example, what is the result of processing #t=5&t=10 ? I think the  
>> result should be 10, because it is what you would usually implement by  
>> mistake if not making a conscious choice.
>>
>> The other option is that duplicating any dimension should cause the  
>> entire fragment to be ignored, which I do not support.
>
> This is somewhat similar to the first case, but much more serious.  
> Personally, I am heavily opposed to letting over-specified do anything  
> but return a hard error. If the URL was generated by a program this  
> means the program is buggy, if it was done by a human, similarly, the  
> person should be taught to mend their ways. Guessing that "the last one  
> is probably what was meant" is a random choice. Actually, I would argue  
> that if it was a human who created this specific URL the "right thing"  
> to do is probably to start at second 15. (I send you a fragment starting  
> at second 5. You don't like the first 10 seconds of that clip, so before  
> you forward it to another friend you tack a "&t=10" to the end).

The more important case to considered is #t=npt:5&t=foo:12

When new temporal syntax foo arrives in MF 2.0, there will be both UAs  
supporting MF 1.0 and those supporting MF 2.0 in existence for a very long  
time. In that very long time, it should be possible to use both syntaxes  
and have MF 1.0 UAs simply fall back to the one they understand which  
approximates the new foo. Degrade gracefully! This is best achieved by  
having the UA use the last fragment it recognizes, which is also very  
simple for authors to understand and work with.

On #t=5&t=10, I'll note that the spec currently *allows* overspecificaton.  
However, I agree with you that it should be invalid, so that validators  
can warn authors about their mistake. The processing rules should however  
should tolerate it because a parser which rejects it is much more complex  
for no real gain, resulting in more work and more bugs.

>> I trust none of this should be controversial. By having well-defined  
>> processing we can avoid some of the reverse-engineering and  
>> defacto-but-not-not-speced behavior that inevitably happens otherwise.  
>> After these parts have been written, I will be happy to review the spec  
>> again.
>
>
> Except for the non-controversial part (should be clear from the above:-)  
> I fully agree.
> --
> Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
> If I can't dance I don't want to be part of your revolution -- Emma  
> Goldman

If the spec is written such that a conforming implementation must parse  
and validate all parts of the syntax --  even those which it will not use  
-- it is impossible to introduce support for MF gradually. This means that  
the first implementations will either have to ignore the spec or be  
delayed due to the unreasonable burden that is a validating parser. Either  
would be unfortunate.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Wednesday, 2 December 2009 23:00:59 UTC