Re: Processing requirements from Philip Jägenstedt on 2010-01-05 (public-media-fragment@w3.org from January 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Tue, 05 Jan 2010 22:10:50 +0100
To: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>
Cc: "Media Fragment" <public-media-fragment@w3.org>
Message-ID: <op.u52vr9fcsr6mfa@worf>
This is becoming unreadable, so I'll [snip] liberally.

On Tue, 05 Jan 2010 13:35:58 +0100, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> On Tue, Jan 5, 2010 at 10:57 PM, Philip Jägenstedt <philipj@opera.com>  
> wrote:
>> On Tue, 05 Jan 2010 03:16:21 +0100, Silvia Pfeiffer
>> <silviapfeiffer1@gmail.com> wrote:
>>
>>> On Mon, Jan 4, 2010 at 11:05 PM, Philip Jägenstedt <philipj@opera.com>
>>> wrote:
>>>>
>>>> Using the path or the query look equivalent to me, both are specific  
>>>> to a
>>>> specific server configuration. Do caches really treat
>>>> http://www.example.com/video/track=video1/track=audio2/t=20,80 and
>>>> http://www.example.com/video?track=video1&track=audio2&t=20,80 in any  
>>>> way
>>>> differently with respect to the "original" resource
>>>> http://www.example.com/video/?
>>>
>>> I've checked back with caching of queries and it's actually worse for
>>> URI queries than for the REST-style resources: URI queries are often
>>> not cached at all, since it is assumed they come from forms, which are
>>> highly volatile. So, you're probably right and they are fairly
>>> similar.
>>
>> OK, since it's not possible to differentiate media fragment URIs from  
>> other
>> URIs in a fool-proof manner, I guess proxies can't change that behavior
>> without breaking sites that accidentally use MF-like syntax.
>
> The distinction doesn't come through naming conventions, but only
> through the MIME type. Proxies can identify the MIME type and do
> something different when they realise they're dealing with media
> resources. It is always possible for a proxy to do more than what they
> currently wrt caching and byte range requests etc.

Even if restricted to only certain MIME types, surely there must be lots  
and lots of resources out there that just happen to have a query string  
that looks like MF syntax? Proxies can try to be clever, but it sounds  
like they would fail in this case.

>>>> Is the idea that caches should assume that
>>>> URLs which happen to look like they use media fragments syntax in the
>>>> query
>>>> componenet are related to the URL with the query component stripped?
>>>
>>> Not by default. We had a intelligent protocol that included mapping
>>> queries to HTTP byte range requests and thus make them cachable (see
>>> also the old temporal URI fragments spec at
>>> http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.txt), going
>>> back to existing mechanisms.
>>>
>>>> This
>>>> sounds very fragile to me, shouldn't this be done with new HTTP  
>>>> headers
>>>> so
>>>> that the caching proxy doesn't need to be concerned with parsing MF?
>>>> Something like Original-Location? I haven't followed the server-part  
>>>> of
>>>> MF
>>>> very well, so perhaps I'm missing something.
>>>
>>> Yup, there are some new HTTP headers involved, see
>>>
>>> http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#processing-protocol-proxy
>>> as applied to URI queries. Something we haven't agreed on yet.
>>
>> This looks a lot more robust. Caching of resources that use the query
>> component seems rather messy, but it looks like the current spec steers
>> clear of that.
>
> All it does right now is refer back to that section, which should
> probably also work for queries.

Ah, so the server would send these very same headers regardless of if the  
fragment was addressed using query component syntax or if it in reply to a  
fancy request from a client implementing the fragment component syntax?  
Then it all makes good sense, only HTTP headers need to be involved,  
caching proxies don't need to be concerned about the query string.

>> There is, however, one part I'm quite puzzled by: should it be valid to
>> include server-specific parts in the query string mixed in with MF  
>> syntax?
>
> Sure! There are plenty more services media servers can offer beyond
> mere media fragment delivery.
>
>> Say someone already has a resource http://example.com/getvideo?id=42  
>> (where
>> 42 might be a database row id or something). id here doesn't refer to  
>> the id
>> in MF. Is it possible to add MF syntax like 't=5' on top of this? It  
>> would
>> be valid syntax but id would be ambiguous. Should it be valid to mix  
>> other
>> pre-existing names that don't collide, like
>> http://example.com/getvideo?foo=42&t=5 ? It seems we are making it very
>> difficult to migrate any URLs that already use the query component to  
>> use
>> MF. This isn't really an issue in the fragment case as there's not  
>> really a
>> lot (any?) existing use of the URI fragment that we could trample on.
>
> Yeah, I think that's the advantage. The main query strings ppl are
> after are the ones we are defining here. Then there are less important
> ones, such as format="jpg" asking for a time offset to be returned as
> a thumbnail etc. YouTube has a gazillion of them and anyone wanting to
> implement a good media server is highly adviced to check out YouTube's
> query (or rather: fragment) parameters to see what is possible.

If we not only allow but even expect server developers to mix and match  
freely in the query string, is it possible to have any meaningful  
conformance requirements for them? At least the following break:

* We can't define how to extract a name/value list from the query string,  
because the server might already have other requirements e.g. with regards  
to how to treat '+', which character encoding to assume when decoding %  
encoding or already use the same names as MF for something else.

* We can't define how to map a name/value list into a set of dimensions,  
because only the server knows which names were actually intended for MF  
and which just happen to collide.

It would be very strange for our spec to have conformance requirements on  
syntax, etc and at the same time recommend or expect implementations to  
freely ignore them.

Perhaps I should refrain from such huge mails, but I got to thinking and  
writing about what the side-effects would be:

If it's not feasible to ban all non-MF syntax from query strings (and I  
don't think it is, because the resources already exist) then the remaining  
option is to not touch query strings at the syntax level at all. What we  
could do however, is to encourage other specs (in practice  
implementations) to reuse MF at the level where they are compatible.

* If they are pure MF they can simply reuse everything and say the query  
component must be a valid production of [the syntax we define to  
name/value pairs] and so on.

* If they split name/value pairs differently (e.g. allowing ; as a  
separator) they can define this themselves and then reuse our definition  
of how to map name/value pairs to a set of dimensions and beyond, as long  
as they only use MF name/value pairs. If they happen to use any character  
allow in MF syntax as a separator everything breaks and they must go to  
the next level.

* If they "mix and match", they must define themselves how to get all the  
way to a set of MF dimensions, after which they can reuse the MF spec.

I don't think any of this has any great importance to actual  
implementation efforts and would expect that people try to stay close to  
MF for simplicity. I might try to edit something along these lines for  
your consideration.

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Tuesday, 5 January 2010 21:09:36 UTC