Re: Processing requirements from Silvia Pfeiffer on 2010-01-05 (public-media-fragment@w3.org from January 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 5 Jan 2010 13:16:21 +1100
To: Philip Jägenstedt <philipj@opera.com>
Cc: Jack Jansen <Jack.Jansen@cwi.nl>, Media Fragment <public-media-fragment@w3.org>
Message-ID: <2c0e02831001041816paf62edbuf22503876f0629d0@mail.gmail.com>
On Mon, Jan 4, 2010 at 11:05 PM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Wed, 30 Dec 2009 04:33:36 +0100, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>
>> On Wed, Dec 30, 2009 at 3:20 AM, Philip Jägenstedt <philipj@opera.com>
>> wrote:
>>>
>>> On Tue, 29 Dec 2009 15:03:50 +0100, Silvia Pfeiffer
>>> <silviapfeiffer1@gmail.com> wrote:
>>>>
>>>>
>>>> Now, I'd say that we're probably safe using "&" as a separator for URI
>>>> queries, since that has been specified in the CGI "standard" and has
>>>> continuously been applied, even if never formally specified. It is a
>>>> de-facto standard.
>>>
>>> I agree that it's safe, but we must formally specify it, either by
>>> referencing an existing spec (which I have failed to find) or by
>>> specifying
>>> it ourselves.
>>
>> A proper spec doesn't exist. All we have is the CGI spec. It's been my
>> greatest problem with the temporal URI spec for years from a
>> "completeness" point of view, but actually has never been a practical
>> problem, since ppl have just assumed the de-facto standard.
>>
>>
>>>> As for URI fragments, the idea is to keep it in sync with URI queries
>>>> and thus we also used the "&".
>>>
>>> I certainly agree with keeping them in sync, but the fragment component
>>> syntax is the one we can specify ourselves and it will work on many
>>> existing
>>> server configurations as a bonus.
>>
>> Actually: no, we cannot define the fragment component syntax for any
>> video or audio mime type. In fact, the URI specification says that the
>> fragment syntax is specified by the owner of the mime type - i.e. the
>> owner of video/ogg or video/mpeg4 (and audio) in the HTML5 case. All
>> that we can realistically do is provide a recommendation for mime type
>> owners to adopt our specification. We cannot really make an
>> enforceable standard. OTOH, ppl have been waiting for such a spec, so
>> they will gladly adopt it rather than create their own.
>
> Thanks, I didn't know this. It seems then that we can't reasonably state any
> conformance requirements at all in terms of the syntax of the query or
> fragment and rather must do it in terms of abstract name/value. This is
> actually good news to me and I will write a concrete suggestion on how to
> handle it in my next mail.

I think that is the safest approach for now. I vaguely remember that
Yves had a chat with other W3C members - including TBL - who suggested
just doing a normative specification for media resources and ignoring
the fact that fragment or query syntax is not normally standardised.
Maybe Yves can clarify this. I don't think it has much of an effect on
our spec though.


>>>> Now, both approaches (URI fragment and query) may conflict with some
>>>> already created specifications (as analysed and listed in
>>>>
>>>>
>>>> http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-reqs/#ExistingSchemes).
>>>> This is unavoidable when standardising the use of something that has
>>>> been in the wild so far.
>>>>
>>>>
>>>> http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#processing-overview-standardisation
>>>> talks about this problem and makes clear that harmonisation is
>>>> necessary and that it is not possible to "prescribe" this format.
>>>> Which probably means that media fragments will always be a
>>>> recommendation rather than a standard.
>>>
>>> Yes, we will conflict with e.g. your Temporal URI spec and MPEG-21, which
>>> is
>>> to be expected as MF is supposed to supersede both.
>>
>> Well, I'm not actually sure MPEG-21 will adopt it. But the thing is:
>> even if the mime type owners don't accept it, what actually counts is
>> what the browser vendors implement. :-)
>>
>>
>>> However, existing query component schemes aren't really specs as such,
>>> they
>>> are actually defined by their (usually single) implementation. However,
>>> if
>>> we agree that MF should only normatively define the syntax and processing
>>> rules for URI *fragments*, then we don't need to discuss the query
>>> component
>>> issue any further.
>>
>> Some past discussions have found that we need to do both. The URI
>> queries approach has its use cases where you want to create a shorter
>> form from a longer resource - e.g. a playlists mashed up from segments
>> from multiple videos. We have embraced such use cases in the
>> requirements specification and they would require the use of URI
>> queries.
>>
>> To be complete, it is also possible to not use URI queries, but to use
>> some kind of REST interface, as you have mentioned before, e.g.
>> http://www.example.com/video/track=video1/track=audio2/t=20,80 . But
>> this resource has nothing at all to do with the original resource,
>> which may be http://www.example.com/video/, so caching is impossible.
>> Using URI queries at least provides a means to enable caching and to
>> continue having the link back to the original resource.
>
> Using the path or the query look equivalent to me, both are specific to a
> specific server configuration. Do caches really treat
> http://www.example.com/video/track=video1/track=audio2/t=20,80 and
> http://www.example.com/video?track=video1&track=audio2&t=20,80 in any way
> differently with respect to the "original" resource
> http://www.example.com/video/?

I've checked back with caching of queries and it's actually worse for
URI queries than for the REST-style resources: URI queries are often
not cached at all, since it is assumed they come from forms, which are
highly volatile. So, you're probably right and they are fairly
similar.

> Is the idea that caches should assume that
> URLs which happen to look like they use media fragments syntax in the query
> componenet are related to the URL with the query component stripped?

Not by default. We had a intelligent protocol that included mapping
queries to HTTP byte range requests and thus make them cachable (see
also the old temporal URI fragments spec at
http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.txt), going
back to existing mechanisms.

> This
> sounds very fragile to me, shouldn't this be done with new HTTP headers so
> that the caching proxy doesn't need to be concerned with parsing MF?
> Something like Original-Location? I haven't followed the server-part of MF
> very well, so perhaps I'm missing something.

Yup, there are some new HTTP headers involved, see
http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#processing-protocol-proxy
as applied to URI queries. Something we haven't agreed on yet.


>> OTOH there are a lot of issues to deal with when using queries. We can
>> only address a small part of the URI query possibilities in the MF
>> spec, namely the one that overlaps with the spec we're creating for
>> URI fragments. That has been the basis of our decisions so far.
>>
>> Why do you think URI queries are so much more of a problem? I wasn't
>> able to read that out of the irc discussion either. Standardisation of
>> how to create URI queries is useful, since then there are compatible
>> naming conventions across servers and clients and applications can
>> rely on things working the way they'd expect to. From a HTML5 POV, URI
>> queries don't matter, since they don't concern the browser. But when
>> specifying URIs, one has to think far beyond just the browser, IMHO.
>
> I don't think that there's a problem with using the query component to
> "apply" the fragment server-side at all, that's very useful. I think this is
> a spec layering problem mostly. Certainly browsers don't need to care, but I
> still want the whole of the spec to be consistent and robust, not just "my"
> parts.

Ok, it seems we agree. And yet I seemed to read out of your previous
emails that you want to remove the URI query related sections
completely. Did I misunderstand?


>>>> We could do one thing though: maybe we should add the link to the CGI
>>>> specification to the spec to explain where the formatting comes from.
>>>
>>> The CGI documentation only provides a rough description and isn't
>>> suitable
>>> for a normative reference. For example, it says "you should URL decode
>>> the
>>> name" but not how to do that. It is quite important to know how to
>>> interpret
>>> #t=npt%3a10s (%3A is ':', but is %3a also tolerated?) and #id=100% ('%'
>>> should be encoded as %25, but what to do with a stray %?).
>>>
>>> Specifying this is very simple:
>>>
>>> 1. split the string on &
>>> 2. split the resulting string on the first occurrence of '=' and let name
>>> be
>>> the first part and value be the second part. if there is no = in the
>>> string
>>> let value be ''
>>> 3. decode name and value according to [some very fine spec we can reuse I
>>> hope]
>>>
>>> Simple but necessary as the spec can't make any normative requirements at
>>> all about fragment dimensions if it doesn't define how to get from a
>>> fragment component to a list of fragment dimensions.
>>
>> Agreed, that is somewhat implicit in the specification right now.
>>
>>
>>>> Philip, note that the specification only defines a syntax for the URI
>>>> fragment case, but leaves out the URI query case and just alludes to
>>>> the fact that it is done in the same way. I think that is already what
>>>> you are suggesting, no?
>>>
>>> The spec treats the query and fragment component equally as far as I can
>>> see, so any normative requirements on URI fragments are also being made
>>> on
>>> URI queries. For example:
>>>
>>> "The syntax is based on the specification of particular field-value pairs
>>> that can be used in URI fragment and URI query requests to restrict a
>>> media
>>> resource to a certain fragment."
>>>
>>> "There are therefore two possibilities for representing the media
>>> fragment
>>> addressing in URIs: the URI query part or the URI fragment part."
>>>
>>> "The composition of a URI fragment or query string for a media resource
>>> relies on a series of field-value pairs to be added behind the URI
>>> fragment
>>> ('#') or query ('?') identifier."
>>>
>>> "In this section we present the ABNF syntax for the field-value pairs
>>> that
>>> relate to a media fragment URI. The names for the non-terminals
>>> more-or-less
>>> follow the names used in the previous subsections, with one clear
>>> difference: the start symbol is called mediasegment, because we want to
>>> allow application of it to both URI fragment and URI query strings."
>>
>> Yes, I think you're right. It does apply to both URI fragment and URI
>> query. But that was intentional, as discussed above.
>>
>>
>>> If the intention is that the ABNF syntax be normative only for URI
>>> fragments, this should be clarified by removing the 'segment' ABNF and
>>> instead require that mediasegment be a valid production of the ifragment
>>> syntax from the IRI spec. This might have implications for the use of '+'
>>> in
>>> datetime, I haven't checked.
>>
>> I do wonder about this last detail. Might be worth checking.
>
> If we agree on specifying processing for fragment syntax then I will
> certainly research this.

Cool.


>>> There are several places in the spec that talk about Media Fragments, URI
>>> fragments and URI queries as if URI fragments and URI queries are a
>>> subset
>>> or Media Fragments rather the Media Fragments being a subset of URI
>>> fragments. I'm quite confused by this terminology, could someone clarify?
>>> I
>>> would like to see Media Fragment added to the terminology section.
>>
>> So far, what we have specified is the following (see
>>
>> http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#terminology):
>> In this document, when the term 'media fragment URIs' is used, it
>> actually means 'media fragment URI references'.
>>
>> This means that a media fragment URI is just generally a URI that
>> deals with a section of a media resource. It does not say how.
>
> As long as "URI" which doesn't have a fragment component can also be called
> a "URI reference" I guess this part is fine.

RFC 3986 (http://www.ietf.org/rfc/rfc3986.txt) section 4.1 defines a
URI reference as follows:
 URI-reference = URI / relative-ref

A relative reference is a REST URI and can include a URI query or a
URI fragment. So, yes, a "URI reference" is everything. :-)

> The spec also talks a lot about
> "media fragments", what does that term mean?

Well, it's the actual part in the media resource. I simply assumed
that was obvious. :-)

> Especially since it uses the
> word "fragment" it's very easy to assume that it in fact has something to do
> with URI fragment components. Adding the definition (if we know it) to the
> terminology section would be very helpful.

Ok, feel free to add it when you do your edits.


>> URI fragment and URI query quite plainly specify how to deal with the
>> media fragment URI: namely either through use of a URI fragment or a
>> URI query.
>>
>> I thought we used these quite consistently and made sure they didn't
>> get mixed up. So, what, in your opinion, is missing?
>
> Just the definition of "media fragment" sans "URI".

OK.


>>> [pause]
>>>
>>> My primary concern is that the processing of fragment component is still
>>> undefined as it is my intention to support MF in Opera at some point. In
>>> the
>>> bad old days when a spec left something undefined one browser would just
>>> make something up and the others would reverse-engineer it, but I am
>>> still
>>> young and naive to think that things are different now. I am willing to
>>> edit
>>> the spec myself to show clearly what it is I'm suggesting.
>>
>> I'm more than happy for you to make such changes - in particular to
>> separate out the structure of parameters in a URI fragment and URI
>> query from the actual specification of the name-value pairs in use. As
>> mentioned in the email to Jack, I do think it makes sense to separate
>> that into a section that specifies the foundations that we build upon.
>> If you want to go ahead and do that, I wouldn't have a problem. But I
>> don't speak for the others, so maybe wait until we get their input.
>> :-)
>>
>>
>> Cheers,
>> Silvia.
>>
>
>
> --
> Philip Jägenstedt
> Core Developer
> Opera Software
>

Cheers,
Silvia.
Received on Tuesday, 5 January 2010 02:17:17 UTC