- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Sat, 12 Dec 2009 15:46:53 +1100
- To: Philip Jägenstedt <philipj@opera.com>
- Cc: Jack Jansen <Jack.Jansen@cwi.nl>, Media Fragment <public-media-fragment@w3.org>
Hi all, While in the process of editing the specification, I have made some adjustments that go a fair way towards solving these issues. So, let me contribute these here. On Thu, Dec 3, 2009 at 10:01 AM, Philip Jägenstedt <philipj@opera.com> wrote: > On Wed, 02 Dec 2009 21:51:47 +0100, Jack Jansen <Jack.Jansen@cwi.nl> wrote: > >> >> On 2 dec 2009, at 12:55, Philip Jägenstedt wrote: >> >>> Following up on my previous email and todays IRC-conference (for me). >>> >>> I won't get involved in the editors stylistic choices between ABNF, >>> equivalent parsing algorithms (only the side effects of which are normative) >>> or any other spec technique, but would request that at least the following >>> are defined: >>> >>> 1. Splitting of name-value pairs >>> >>> The current ABNF only allows joining timesegment / spacesegment / >>> tracksegment by "&", which means that e.g. #t=5& is not allowed because it >>> has a trailing &, which is very easy to get by accident if you write a >>> script like this: >>> >>> urifrag = '#': >>> for d in dimensions: >>> urifrag += d + '&' >> >> I'm not thrilled by this idea. The web has a long history of features >> where an initial implementation was syntactically forgiving because it was >> deemed to be user-friendly at the time. Many of these have been causing >> endless headaches until today. Think of the ability to use filenames >> (especially Windows filenames) in the URL-bar, or in attributes in the HTML >> code. Think of global variables in JavaScript. > > Let's be clear that validity and processing requirements are separate > things. That the processing for a certain input is well defined does not > mean that said input is valid. The validity definition is useful for authors > to check their syntax against (using a validator) to find some mistakes, > etc. In my opinion, processing requirements should be as strict as possible > (staying close to the valid syntax) while still being easy to understand > (for test suite writers, implementors and actual authors) and degrading > gracefully for forward-compatibility in the contexts where it is necessary. > > I am not suggesting relaxing e.g. any of the temporal syntaxes because there > is no benefit in doing so -- they are fixed and will not be changed by > future spec revisions. > > The Web platform is full of ugly and broken features, but that is not > because specs had unambiguous but lax processing requirements, it is because > they either did not exist or left processing ambiguous or undefined. This > results in poor interoperability and an inevitable race towards the most > forgiving parsing possible. We absolutely do not want this to happen yet > again with media fragments. I have added two paragraphs to the the ABNF specification section, see http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#naming-syntax, which specifies how we look at media fragment URIs. I think this is necessary. I have kept it slightly more generic than just specifying "&" as a separator and also allowed ";" as a separator, since that is being used often by applications as a separator (see http://en.wikipedia.org/wiki/Query_string). I think that's a good compromise to take to address Philip's concern. > By the way, is anyone developing a MF validator? One could surely be written > in JavaScript quickly. No, not yet, but please go ahead and do so! It would be awesome to have that. I have, of course, as part of my demo at http://www.annodex.net/~silvia/itext/mediafrag.html implemented a quick and dirty parser, but it's in no way shape or form complete. >>> This specific case *can* be fixed in the ABNF, but leads into the next >>> issue: >>> >>> 2. Handling of unrecognized syntax >>> >>> This means that #u=12&t=5 can still proceed to getting the time offset 5. >>> Not allowing this makes it impossible to extend MF in the future as any new >>> syntax is invalid per the current spec. >>> >>> As a necessary (but unsightly) side-effect, anything between & that isn't >>> recognized should be ignored, including the empty string. Thus a conforming >>> UA should be able to handle this extreme: >>> >>> #&&=&=tom&jerry=&t=34&t=meow:0# (time offset 34 seconds) >> >> This is a very difficult issue, we already touched on it in the last >> teleconf. The problem is that there are two types of future extensions, and >> they need opposite solutions. Some future attributes should preferably be >> ignored by older implementations, think of a hypothetical >> "preferred-languages=english-french-german" attribute. Other future >> attributes should lead to an error if the older implementation doesn't >> understand the attribute, think of "rating=pg" (which would return only >> tracks with a rating of G or PG, supposedly). >> >> But: I have an idea that may be a solution to this, loosely based on the >> SMIL skip-content attribute >> (http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-content.html#adef-skip-content). >> If we add an attribute that tells older implementations what to do (ignore >> unknown attributes, or raise an error) we could have our cake and eat it. >> The first example would then usually be coded as >> "....&preferred-languages=english-french-german&unknown=ignore", the second >> as "....&rating=pg&unknown=error". The only remaining question is now: what >> is the default value for the unknown attribute. >> >> What do y'all think? Would this fly? > > Adding processing instructions on the same level as the actual syntax > strikes me as very odd, but is technically possible. > > Defaulting to unknown=error would be a bad idea. When an author tests their > syntax in a UA that does understand "rating=pg", unknown=ignore has no > effect so they will not use it (and validators won't complain because the > new syntax is valid per the new spec). A good portion of authors write by > trial and error, so at this point they think they are done. However, all old > UAs are now required to fail. They get angry bug reports from their users, > while users of UAs which ignored the spec are still happy. > > Defaulting to unknown=ignore and honoring unknown=error would be possible, > but is still a worse behavior than if the UA can use all of the components > it *does* understand. The rare case of mandatory failure must, logically, be > handled outside of MF because UAs which don't understand MF at all (e.g. all > web browsers ever shipped to date) would otherwise bypass it. I agree with Philip and would not really want to add processing instructions into the URI fragment or query string. I think what I described above already addresses the issues that Philip brought up. But I may have missed something, so please check and let me know. >>> 3. Processing order >>> >>> As an example, what is the result of processing #t=5&t=10 ? I think the >>> result should be 10, because it is what you would usually implement by >>> mistake if not making a conscious choice. >>> >>> The other option is that duplicating any dimension should cause the >>> entire fragment to be ignored, which I do not support. >> >> This is somewhat similar to the first case, but much more serious. >> Personally, I am heavily opposed to letting over-specified do anything but >> return a hard error. If the URL was generated by a program this means the >> program is buggy, if it was done by a human, similarly, the person should be >> taught to mend their ways. Guessing that "the last one is probably what was >> meant" is a random choice. Actually, I would argue that if it was a human >> who created this specific URL the "right thing" to do is probably to start >> at second 15. (I send you a fragment starting at second 5. You don't like >> the first 10 seconds of that clip, so before you forward it to another >> friend you tack a "&t=10" to the end). > > The more important case to considered is #t=npt:5&t=foo:12 According to the syntax that we are standardising, the second field-name parameter is invalid, so t=npt:5 dominates IMO. > When new temporal syntax foo arrives in MF 2.0, there will be both UAs > supporting MF 1.0 and those supporting MF 2.0 in existence for a very long > time. In that very long time, it should be possible to use both syntaxes and > have MF 1.0 UAs simply fall back to the one they understand which > approximates the new foo. Degrade gracefully! This is best achieved by > having the UA use the last fragment it recognizes, which is also very simple > for authors to understand and work with. Yes, I think this makes sense and that's also what I have added into the specification. Check out the newly created section http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#processing-overview-errors - it contains a start at the list of errors that a system may encounter and proposes what to do in those cases. I have specified that for over-specified dimensions the last occurrence is being used. This is indeed the opposite of what we previously proposed, but I do agree with Philip here. > On #t=5&t=10, I'll note that the spec currently *allows* overspecificaton. > However, I agree with you that it should be invalid, so that validators can > warn authors about their mistake. The processing rules should however should > tolerate it because a parser which rejects it is much more complex for no > real gain, resulting in more work and more bugs. Should it be invalid instead of using the last occurrence? I prefer to do something that makes sense rather then putting the specification screws on too tightly for programs and users. Let me know if I have missed anything. Regards, Silvia.
Received on Saturday, 12 December 2009 04:47:46 UTC