URI fragments and the delivery of media fragments

Hi all,

This is to address my ACTION item on how far we can go with URI
fragment specifications and the communication between uers, user
agent, proxies, and origin server.

I've had to do some reading up on the URI standard, on how proxies
work and http headers, so I hope we can have an interesting discussion
about this next week in Cannes.

Let me start by listing the side conditions under which I believe we
are working:
1. the URI specification is fixed and we should work within its boundaries
2. if possible, we should avoid requiring any changes to the software
that runs on Web proxies
3. if possible, we should not require more than one set of changes to
user agents and these changes should be independent of the media type
4. since the origin server needs to implement heavy handling of media
files to deliver media fragments and these changes are already
dependent on the media type, we should try to focus all necessary
changes to the resource delivery chain over http at the origin server
end
5. since we are trying to accommodate all types of media resources,
our model of a media resource needs to be as generic as possible and
we need to assume there is a unique map of time ranges to byte ranges
(the map may however be surjective in the mathematical sense
http://mathworld.wolfram.com/Surjection.html).

I have tried to address 5. through the media resource description
given at http://www.w3.org/2008/WebVideo/Fragments/wiki/Glossary#Video_Resource
.

As for the rest, I am working with the following model of a user/UA -
proxy - origin server communication:
http://www.w3.org/2008/WebVideo/Fragments/wiki/Image:Http_sequence.jpg
. Can you check if you agree with that model?

Now to the actual communication.

When I originally said: we cannot use URI fragments, I was referring
to the fact that they will not be guaranteed to go beyond the User
Agent. This means: we can use them on link 1 and 9 (between user and
UA) e.g. for browser history purposes, but we cannot rely on them to
exist anywhere else on the communication.

Incidentally, I have had a long discussion with my colleague John
Ferlito, who is a network guru, and he reckons we should avoid using
both, query ("?") and fragment ("#"). The reason is that both are
already being used massively around the Web and we may break some
existing Web resources in this way, in particular with query ("?").
Even if we are trying to use them only for specific media types, the
problem is that it is impossible to tell from a URI what the media
type is (e.g. http://example.com/resource#t=50-70 - how do you tell
this is a video?) - only the server knows it and can communicate it.
Therefore, the UA will always have to apply the fragment ("#") to the
resource only after it has received the resource - unless it
generically puts the media fragment request into another place in the
URI request, namely into HTTP headers.

Thinking here goes along similar lines as what we discussed for
temporal URIs at
http://annodex.net/TR/draft-pfeiffer-temporal-fragments-03.txt (search
for byte range).

The idea is that if we don't want to change the way in which Web
proxies work, we have to work within their given resource fragment
caching functionality. The simple fact is that byte ranges are the
only way for Web proxies to deal with subparts of a Web resource. And
since the Web server is the only one that can determine the byte
range, we need a four-way handshake protocol to make media fragment
URI requests cachable over HTTP. In the first path, the UA is being
told which byte range to request, and in the second path the UA can
request the resource with the correct time ranges.

Now, we can choose URI fragments ("#") to specify the time segment(s)
to the UA, or we can use anyone of the reserved delimiting characters
from the URI spec (i.e. sub-delims  = "!" / "$" / "&" / "'" / "(" /
")" / "*" / "+" / "," / ";" / "="). In essence the choice boils down
to the probability of clashing with somebody else's already defined
URI scheme. It may be an idea to ask Google to do us a special search
on all the URIs they have stored and check if whatever scheme we come
up with clashes with an existing URI scheme. In general I agree with
others: "#" and ";" probably make the most sense. So, should we decide
on something like http://example.com/resource#t=12-30,50-90 then we
might want to ask google for all the URIs it can find that have "#t="
in them.

Anyway .. moving on to the four-way handshake. The way it is specified
in the temporal URI spec is one way to do it - it requires a resource
redirection by the origin server such that the second path accesses
the correct resource. After having thought about time range requests,
John and I came up with the following alternative (explained on the
example of a Ogg video resource):


Initial request from a user in a Web browser:

User -> UA (1):
http://example.com/resource.ogv;t=20-30

UA chops off fragment and turns it into a HTTP GET request with a time
range header (which can incidentally also be cached by a proxy):

UA -> Proxy (2) -> Origin Server (3):
GET http://example.com/resource.ogv
Range: time 20-30

Origin Server converts time range to byte range and put all additional
data that cannot be cached but is required by the UA to receive a
fully functional media resource into the HTTP response.

Origin Server -> Proxy (7) -> UA (8):
RESPONSE 200
<...ogg header + skeleton...>
Content-Range: time 20-30
Content-Type: video/ogg; codecs=theora,vorbis
Time-Range: bytes 50000-200000/FILESIZE (this is a new HTTP header)

The UA buffers the data it receives for hand-over to the media
subsystem. It then proceeds to put the actual fragment request
through:

UA -> Proxy (2) -> Origin Server (3):
GET http://example.com/resource.ogv
Range: bytes 50000-200000

The Origin Server puts the data together and sends it to the UA:

Origin Server -> Proxy (7) -> UA (8):
RESPONSE 200
<... bytes of video data ...>
Content-Range: bytes 50000-200000/FILESIZE

The UA hands over the header and video data to the medai subsystem and
therefore display it to the user (9).



If we want to make media fragment resources cachable on the Web, we
don't have many choices. We can however optimise the process for
specific media types, e.g. for the quicktime streams that Dave Singer
talked about. I can't however see a way to avoid a four-way handshake
at least one per resource.

I think we will have a nice discussion next week. :-)

Cheers,
Silvia.

Received on Wednesday, 15 October 2008 11:37:30 UTC