Re: Issue with "bytes" Range Unit and live streaming from Craig Pratt on 2016-04-20 (ietf-http-wg@w3.org from April to June 2016)

From: Craig Pratt <craig@ecaspia.com>
Date: Wed, 20 Apr 2016 12:57:28 -0700
To: Thorsten Lohmar <thorsten.lohmar@ericsson.com>, "K.Morgan@iaea.org" <K.Morgan@iaea.org>, "fielding@gbiv.com" <fielding@gbiv.com>
Cc: Göran Eriksson AP <goran.ap.eriksson@ericsson.com>, "bs7652@att.com" <bs7652@att.com>, "remy@lebeausoftware.org" <remy@lebeausoftware.org>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, "rodger@plexapp.com" <rodger@plexapp.com>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "C.Brunhuber@iaea.org" <C.Brunhuber@iaea.org>, Darshak Thakore <d.thakore@cablelabs.com>
Message-ID: <5717DF28.40909@ecaspia.com>
Hey Thornsten,

On 4/20/16 12:31 PM, Thorsten Lohmar wrote:
> Hi Craig,
>
>> What *is* missing is the ability to get a continuous byte range on live content
>> that starts at an arbitrary offset and the ability to directly jump to the live
>> point.
> Yes, and there are two solutions to the issue:
> A. Enable it on HTTP Layer through the definition of a new range request method
> B. Enable working with existing HTTP procedures, i.e. client can workout the precise byte offsets.
>
> BR,
> /Thorsten
[cp] I guess I don't see these as mutually exclusive.

[cp] In DLNA we have both time-based and byte-based range methods. And 
what we find is that some clients want the server to "help out" for some 
formats (e.g. MPEG2 content) and these clients utilize time-based range 
units. And for other clients just want to use byte-wise access (esp 
clients accessing ISO-BMFF/MP4).

[cp] Both methods need to accommodate aggregated content. 
TimeSeekRange.dlna.org accommodates range/seek requests that include 
aggregated content. And Range doesn't (which is why we're submitting 
this draft).

[cp] It would be great to bring something like TSR in as a Range Unit, 
but it wouldn't replace bytes-live (at least if it's in the same form). 
And it really make sense (IMHO) for time-based and byte-based seek to be 
completely discreet as time-seek requires a media-content-aware HTTP 
server. Bytes (and bytes-live) can be implemented by a much "dumber" 
("content-dumb"?) HTTP-server.

[cp] I hope I'm making sense. (and we haven't lost too many people...)
>
>> -----Original Message-----
>> From: Craig Pratt [mailto:craig@ecaspia.com]
>> Sent: Wednesday, April 20, 2016 8:21 PM
>> To: Thorsten Lohmar; K.Morgan@iaea.org; fielding@gbiv.com
>> Cc: Göran Eriksson AP; bs7652@att.com; remy@lebeausoftware.org; ietf-
>> http-wg@w3.org; rodger@plexapp.com; julian.reschke@gmx.de;
>> C.Brunhuber@iaea.org; Darshak Thakore
>> Subject: Re: Issue with "bytes" Range Unit and live streaming
>>
>> Hey Thorsten,
>>
>> I'm not clear about what you think is missing today from MP4/ISO-BMFF.
>>
>> HTTP/1.1 MP4-aware client has everything it needs today to resolve time
>> offsets to byte offsets and find the nearest random access point(s) -
>> whether unfragmented, static-fragmented, or live-fragmented - and I have
>> code to demonstrate this. I'm sure this must be a misunderstanding, but I
>> don't see what here requires a ISO-BMFF extension.
>>
>> What *is* missing is the ability to get a continuous byte range on live content
>> that starts at an arbitrary offset and the ability to directly jump to the live
>> point.
>>
>> More thoughts in-line.
>>
>> cp
>>
>> On 4/20/16 8:14 AM, Thorsten Lohmar wrote:
>>> Hi Craig,
>>>
>>> See inline.
>>>
>>> It might be worthwhile to bring-up the issue in MPEG. E.g. as ISO-BMFF
>> extension or DASH extension.
>>> BR,
>>> /Thorsten
>>>
>>>> -----Original Message-----
>>>> From: Craig Pratt [mailto:craig@ecaspia.com]
>>>> Sent: Tuesday, April 19, 2016 11:54 PM
>>>> To: Thorsten Lohmar; K.Morgan@iaea.org; fielding@gbiv.com
>>>> Cc: Göran Eriksson AP; bs7652@att.com; remy@lebeausoftware.org; ietf-
>>>> http-wg@w3.org; rodger@plexapp.com; julian.reschke@gmx.de;
>>>> C.Brunhuber@iaea.org; Darshak Thakore
>>>> Subject: Re: Issue with "bytes" Range Unit and live streaming
>>>>
>>>> Reply in-line.
>>>>
>>>> cp
>>>>
>>>> On 4/19/16 7:02 AM, Thorsten Lohmar wrote:
>>>>> Hi Craig,
>>>>>
>>>>> Thanks for sharing the github link. That certainly clarifies the
>>>>> use-case even further.
>>>>>
>>>>> Maybe we should focus the discussion on the fMP4 format for some
>>>>> time, since tune-in into fMP4 requires random access to fragment
>> boundaries.
>>>>> Compared to ts or mp3, fMP4 does not support synchronization to the
>>>>> stream from any received byte. The client must start processing an
>>>>> fMP4 stream from fragment boundaries (or box boundaries). Tuning to
>>>>> selected samples inside of the fMP4 file happens at a later stage.
>>>>>
>>>>> So, when I understand your proposal around the byte-live request
>>>>> correctly, then the client is NOT asking for the precise byte range,
>>>>> but the next possible random access points in close proximity of the
>>>>> requested range. So, the client sends a request containing "Range:
>>>>> bytes-live=0-*" and gets a response with "Content-Range: bytes-live
>>>>> 123456-*": So, the server is providing the client with a HTTP
>>>>> resource, starting in case of fMP4 with the fragment boundary (which
>>>>> is at byte-offset 123456 of the resource).
>>>>>
>>>>> Do I understand it correctly then: When the client wants to tune-in
>>>>> e.g. 5min in the TSB, then the client measures the bitrate of the
>>>>> media stream and calculates a rough byte offset (i.e. 5min x
>>>>> estimated bitrate, lets say byte pos 654321) [all out of scope of
>>>>> the ID] and creates a bytes-live range request of form "Range:
>>>>> bytes-live=654321-*". The server looksup a good fragment boundary
>>>>> (lets say 654420)  and responds with "Content-Range: bytes-live:
>>>>> 654420-*". Do I understand the proposal correctly?
>>>>>
>>>> TSBs and MP4s are a nasty combination. ISO BMFF containers
>>>> accommodate amended content (via fragments). But it doesn't have
>>>> facilities for front-end truncation. It's possible, just not easy. So
>>>> this will go down the rabbit hole quickly.
>>> [TL] Yes, it is possible to append to the end. But you cannot simply delete
>> from the front when you realize a TSB. The question is, whether this issue
>> should be solved on HTTP layer, since likely you also want to give an
>> indication of the TSB to the Users (GUI representation of the timeshiftbuffer
>> depth).
>> [cp] Yeah - there always needs to be a mov box at the front. And while I
>> believe Content-Range is sufficient to communicate front-end truncation,
>> anything time-related is another matter. In MP4, this can be communicated
>> in the timeline via an elst box (edit list). Anything else requires some kind of
>> time-based Range Unit. And that's orthogonal to our draft - and live content
>> in general.
>>
>>>> One would need to define the operation of the server for providing a
>>>> time- shifted MP4 representation. Basically the idea is that the
>>>> server would maintain a list of valid fragments and would have to
>>>> maintain a valid movie box at the front of the representation as
>>>> fragments were added and removed (if a client's expected to consume a
>>>> spec-compliant ISO BMFF container).
>>> [TL] Yes.
>>>
>>>> So I'd say you're mostly right, but let me paraphrase: When an MP4
>>>> client wants to come in at a precise time on a MP4/ISO BMFF (e.g. 5
>>>> minutes), it typically would:
>>>>
>>>> 1) get the movie box from the front of the representation (a couple
>>>> byte- Range requests);
>>>> 2) access the movie fragment header boxes until the requested time
>>>> offset is determined (multiple byte-Range requests);
>>> [TL] Yes, but the Movie Box does not contain such information for a Live
>> offering. So, something is missing here.
>> [cp] Whether the MP4 is being actively appended to or not, walking the
>> fragments (moofs) will provide a client the time offsets. I have code
>> demonstrating this or you can do a protocol analysis of VLC to see it
>> performing the bytes-Range requests. Are we getting tripped up on the
>> definition of "live"? "Live" != "HLS Live Streaming", right?
>>>> 3) If the time offset is found in the currently-available fragments,
>>>> perform a Range request to get the fragment containing the target
>>>> time and start rendering it (one or two "bytes" Range request), and
>>>> start fetching the next Fragment if/when desired. e.g. If the nearest
>>>> random access point for 5 minutes is at the start of a fragment with
>>>> offset
>>>> 4444444 with length 30001:
>>>>
>>>>        Range: bytes=4444444-4474444
>>> [TL] Well, almost. You must always start at the fragment boundary. When
>> the fragment contains e.g. 4sec of media data (e.g. with 1Sec Gops), the
>> client must start fetching from the beginning of the fragment and then skip
>> the media data before the desire playtime start. The client can also do an
>> open range request.
>> [cp] Agreed - which is why I said "the nearest random access point for 5
>> minutes is at the start of a fragment with offset 4444444". The actual
>> sample/frames corresponding with the 5-minute mark would be at some
>> point after 4444444.
>>>> 4) If the time offset is not found to be after the
>>>> currently-available fragments, jump to the live point by grabbing the
>>>> last fragment and any fragments that are added on (one "bytes-live"
>>>> range request). e.g. if the last fragment's offset is 5555555 with length
>> 40001:
>>>>        Range: bytes-live=5555555-*
>>>>
>>>> The client would quickly render the last fragment (to prime the frame
>>>> buffer), put the framebuffer on-screen, and then block on the socket
>>>> (or wait for async notification) and render the new fragment(s) as they
>> come in.
>>> [TL] Ok, understood.
>> [cp] It's important to remember that this is the feature we care about in
>> context of this draft. If you're good with this, then we're on the same page.
>>>> * Note that one might use MP4 timelines in here to communicate the
>>>> fact that sample time 0 is not media time 0. But I'd rather not get too far
>> into that.
>>>> ** As with live content, TSBs are non-cachable. (but in-progress
>>>> recordings
>>>> *are* cache-able)
>>> [TL] Since we talk about a single HTTP transaction with progressive media
>> data, the session is anyhow not cachable.
>> [cp] Well, in-progress recordings should still be byte-wise cachable, correct?
>> Byte 0 will always be byte 0, byte X will always be byte X. So a cache can be
>> populated with the currently-accessible bytes and can satisfy Range
>> requests. There might just be a cache miss if a client accesses bytes not-yet-
>> cached. But a proxy should know that this is a possibility since the Content-
>> Range responses have a "*" in place of the content length.
>>>>> If yes, should the solution be limited to live cases?  If I
>>>>> understand it correctly, then you are looking for a solution where
>>>>> the client indicates a rough byte range in the request and the
>>>>> server responds with another range, which is fulfilling some
>>>>> condition. In case of a live session with fMP4, the server looks for
>>>>> a random access point into the stream. The random access point must
>>>>> be a fragment boundary in case of fMP4 and can be PAT, PMT, PES in
>> case of TS.
>>>> Technically, regular bytes-Range request could have this "fencing"
>>>> behavior (a Client should be driven off the Content-Range response
>> header).
>>>> But I think this is somewhat disingenuous and might be inconsistent
>>>> with
>>>> RFC7233 (I'd have to look closely). And I think this is another
>>>> example of where a different Range Unit would make sense.
>>>>
>>>> e.g. DLNA specifications defined a TimeSeekRange.dlna.org header that
>>>> carries this contract (of providing the most-immediately-preceding
>>>> "decoder- friendly position") in the returned media. Now, this was
>>>> defined to work with HTTP/1.0, so defining a new Range Unit wasn't an
>>>> option. But if so, TimeSeekRange.dlna.org could be easily replaced
>>>> with a "npt" Range Unit (normal play time). And what you're talking
>>>> about would be similar but with byte offsets. Both carry a similar
>> assumption:
>>>> The server has some knowledge of the content structure.
>>> [TL] Yes, started also thinking about time range requests, but I couldn't
>> remember, whether that was define in DLNA, OIPF of DVB. Are the time
>> range headers widely used today? Can such a solution make sense for H1.1
>> or H2?
>> [cp] TimeSeekRange is widely used in DLNA. But they're more useful for less
>> structured content such as MP2 Program/Transport streams. (for MP4
>> representations, one never sees DLNA clients using TimeSeekRange - but
>> servers are required to support TSR for all media formats in CVP-2).
>>
>>>> Aside: Both use cases could be covered with something like a "mso"
>>>> (media-sensitive offset) Range Unit which could take a time offset or
>>>> a byte offset (and incorporate live semantics).
>>>>
>>>> The bytes-live draft (or whatever we end up calling it) is intended
>>>> to assume all knowledge is client-side - just as the case with the
>>>> bytes Range Unit. I think you're assuming some of this server-side
>>>> structure knowledge that I was not intending. But "mso" would make
>> another great Range Unit.
>>> [TL] I still think, that the client should have precise information of the TSB
>> (i.e. precise range and depth) in order to properly render the TSB
>> representation on the GUI (Media Players often render the TSB in a progress
>> bar together with the progress of the Live Session).
>> [cp] I do think it could be useful to define something like
>> TimeSeekRange.dlna.org as a Range Unit. This would be very useful for
>> random access on MPEG2-contained content. If you want to start a draft, I'm
>> happy to co-author. ;^J
>>>> There are many other potential applications of something like
>>>> bytes-live I believe - which is why we're bringing this to the IETF.
>>>> But really it's all about getting to the "live point" in a couple different
>> ways.
>>>> But it assumes the *client* knows where the random access points are
>>>> located (and tailors the byte offsets accordingly).
>>> [TL] Maybe, it is better to discuss this issue in MPEG, either as ISO-BMFF
>> extensions or DASH extensions.
>> [cp] Perhaps. But the mechanism of how the transfer occurs - which is what
>> this draft is related to - is definitely in the transport space.
>> For some complex formats, it may be in the "necessary but not sufficient"
>> category. But as I mentioned at the top, for
>> (linear/progressive) MP4/ISO-BMFF, everything should be there.
>>>>> Nothing inline.
>>>>>
>>>> Good - that was getting crazy... ;^J
>>>>
>>>>> BR,
>>>>>
>>>>> Thorsten
>>>>>
>>>>> *From:*Craig Pratt [mailto:craig@ecaspia.com]
>>>>> *Sent:* Tuesday, April 19, 2016 11:55 AM
>>>>> *To:* Thorsten Lohmar; K.Morgan@iaea.org; fielding@gbiv.com
>>>>> *Cc:* Göran Eriksson AP; bs7652@att.com; remy@lebeausoftware.org;
>>>>> ietf-http-wg@w3.org; rodger@plexapp.com; julian.reschke@gmx.de;
>>>>> C.Brunhuber@iaea.org; Darshak Thakore
>>>>> *Subject:* Re: Issue with "bytes" Range Unit and live streaming
>>>>>
>>>>> Hey Thorsten,
>>>>>
>>>>> I'll try to reply in-line.
>>>>>
>>>>> cp
>>>>>
>>>>> On 4/18/16 3:50 PM, Thorsten Lohmar wrote:
>>>>>
>>>>>       Hi Craig, all,
>>>>>
>>>>>       Thanks for the clarification. Some further question inline
>>>>>
>>>>>       BR,
>>>>>
>>>>>       Thorsten
>>>>>
>>>>>       *From:*Craig Pratt [mailto:craig@ecaspia.com]
>>>>>       *Sent:* Monday, April 18, 2016 10:29 PM
>>>>>       *To:* Thorsten Lohmar; K.Morgan@iaea.org
>>>>>       <mailto:K.Morgan@iaea.org>; fielding@gbiv.com
>>>>>       <mailto:fielding@gbiv.com>
>>>>>       *Cc:* Göran Eriksson AP; bs7652@att.com <mailto:bs7652@att.com>;
>>>>>       remy@lebeausoftware.org <mailto:remy@lebeausoftware.org>;
>>>>>       ietf-http-wg@w3.org <mailto:ietf-http-wg@w3.org>;
>>>>>       rodger@plexapp.com <mailto:rodger@plexapp.com>;
>>>>>       julian.reschke@gmx.de <mailto:julian.reschke@gmx.de>;
>>>>>       C.Brunhuber@iaea.org <mailto:C.Brunhuber@iaea.org>; Darshak
>>>>>       Thakore; STARK, BARBARA H
>>>>>       *Subject:* Re: Issue with "bytes" Range Unit and live streaming
>>>>>
>>>>>       [cc-ing the co-authors]
>>>>>
>>>>>       Hi Thorsten,
>>>>>
>>>>>       I'm happy to help provide whatever answers I can.
>>>>>
>>>>>       Reply in-line.
>>>>>
>>>>>       cp
>>>>>
>>>>>       On 4/18/16 8:10 AM, Thorsten Lohmar wrote:
>>>>>
>>>>>           Hi Craig, all,
>>>>>
>>>>>           My colleague Göran asked me some question around the problem
>>>>>           and I would like to raise these questions directly to you. Of
>>>>>           course, there are some alternative solutions available, where
>>>>>           the client can work out the different things from a manifest.
>>>>>           But you seem to look for a simple solution, which works with
>>>>>           non-segmented media on a single HTTP session.
>>>>>
>>>>>           When I understood it correctly, an HTTP server is making a
>>>>>           live stream available using HTTP. A normal live stream can be
>>>>>           opened with a single HTTP request and the server can serve
>>>>>           data "from the live point" either with or without HTTP chunked
>>>>>           delivery. The server cannot give a Content-Length, since this
>>>>>           is an ongoing live stream of unknown size.
>>>>>
>>>>>       [cp] all correct.
>>>>>
>>>>>
>>>>>       Your use-case seem to be about recording of content. Client should
>>>>>       access content from the recorded part, but should be able to jump
>>>>>       to the live-point. I assume that you are not looking into sliding
>>>>>       window recordings (i.e. timeshift). I assume that the a single
>>>>>       program is continuous recording and the HTTP object is growing
>>>>>       until the end of the live session, correct?
>>>>>
>>>>>       [cp] I didn't spell it out in the draft, but I would like to
>>>>>       consider adding clarifications for the time-shift cases. This
>>>>>       should just be a matter of a Client requesting one thing and
>>>>>       getting another. e.g. "Range: bytes-live=0-*" results in
>>>>>       "Content-Range: bytes-live 123456-*". In either case, you're
>>>>>       correct: the end of the representation is moving forward in time
>>>>>       until the end of the live session.
>>>>>
>>>>>
>>>>>       */[TL] The "Range: bytes-live=0-*" case is not clear to me. Your
>>>>>       ID says "All bytes currently in the representation and those
>>>>>       appended to the end of the representation after the request is
>>>>>       processed". I get the impression, that the server is deleting all
>>>>>       data before a certain timepoint (aka, behavior of a slighting
>>>>>       window timeshift).  So, the client seems to request all data from
>>>>>       the beginning of the timeshift buffer. Why does the server need to
>>>>>       change the byte offset from 0 to 123456? /*
>>>>>
>>>>>       *I can understand, that the server must signal "growing resource"
>>>>>       in the response.*
>>>>>
>>>>> [cp] I was trying to illustrate a case where the server had trimmed
>>>>> off bytes 0-123456 (the TSB model). So in this case, it's signalling
>>>>> to the client "you're getting bytes starting at 123456 (not 0)". e.g.
>>>>> If a client requests "Range: bytes-live=0-*" on an in-progress
>>>>> recording, one might expect:
>>>>>
>>>>>       Content-Range: bytes-live 0-*/*
>>>>>
>>>>> [cp] Basically saying (as described in the ID) that all bytes
>>>>> currently in the representation and those appended to the end of the
>>>>> representation after the request is processed will be returned. But
>>>>> on a TSB, one might expect:
>>>>>
>>>>>       Content-Range: bytes-live 123456-*/*
>>>>>
>>>>> [cp] Basically saying that bytes starting at byte 123456 in the
>>>>> representation and those appended to the end of the representation
>>>>> after the request is processed will be returned.
>>>>>
>>>>> [cp] While I'm thinking about TSB use cases in the back of my mind,
>>>>> this is really not the primary use case I was considering for the ID
>>>>> (but I would hope it can be covered).
>>>>>
>>>>> In any case, how does the client know "good" byte range offsets (i.e.
>>>>> service access points) to tune into the recording? Or is the
>>>>> assumption, that the client can synchronize to the media stream from
>>>>> any byte range?
>>>>>
>>>>> [cp] For byte-level access, random access implementation is up to
>>>>> the client. For some containers this is easier than others. e.g. For
>>>>> MP4, the random access points can be established by reading the
>>>>> movie and fragment header(s). For something like MP2, it's trickier of
>> course.
>>>>> */[TL] Well, in case of fMP4, the client needs to get the Movie
>>>>> Header for initialization. Then, proper access point are fragment
>> boundaries.
>>>>> There are various ways to signal time to byte-offsets. /*
>>>>>
>>>>> [cp] Fragments can actually have multiple access points - implicit
>>>>> (per sample) and explicit (random access points). But yeah, it seems
>>>>> common for fragments to have one random access point (and often
>>>>> correlate to a GOP) - and that there's a huge variety of ways to lay
>>>>> out the samples.
>>>>>
>>>>> */In case of TS, the client needs a PAT, PMT and PES starts for
>>>>> tune-in. It is a bit more tricky, but also here are solutions. /*
>>>>>
>>>>> */But the email talks about "none-segmented" media. The draft talks
>>>>> about "mimicking segmented media". fMP4 is actually the way to
>>>>> create ISO-BMFF segments. So, it is for segmented media, but without
>>>>> a separate manifest?/*
>>>>>
>>>>> [cp] It's important to differentiate between *fragmented* and
>>>>> *segmented* MP4/ISO BMFF representations. bytes-live is most
>>>>> applicable to fragmented files - where you have one representation
>>>>> being used for the entire streaming session - with this
>>>>> representation being appended to periodically (usually one fragment at
>> a time).
>>>>> [cp] I really need to revise my description in the draft to help
>>>>> avoid confusion. What I was trying to describe was how a solution
>>>>> using just byte-Range requests would always be slightly behind the
>>>>> live point - as is the case with rendering "live" segmented streams.
>>>>> While bytes-live could be used for fragmented (MP4/ISO BMFF) content
>>>>> or segmented content, the primary use case is for non-segmented
>>>>> representations.
>>>>>
>>>>>
>>>>>
>>>>> [cp] One major feature this draft allows is for retrieval of bytes
>>>>> just preceding the live point. So for example, a client can do a
>>>>> Range head request like "Range: bytes=0-", get a "Content-Range:
>>>>> bytes 0-1234567/*", then perform something like a "Range:
>>>>> bytes-live=1200000-*", and prime its framebuffer with 34567 bytes of
>>>>> data that precede the live point - allowing for the client to find
>>>>> an access point (e.g. mpeg2 start codes) and to allow live
>>>>> presentation to display much sooner than it would from the live
>>>>> point (without random access).
>>>>>
>>>>> */[TL] So, how does the client know, that the proper fragment
>>>>> boundary is at byte position 120000? Do you assume that the client
>>>>> first fetches a time-to-byte offset file, which tells the client
>>>>> that a access point (e.g. a fragment boundary) is at byte pos
>>>>> 120000? If yes, why does the client need the HEAD request, when it
>>>>> already has the byte position?/*
>>>>>
>>>>> [cp] How a client know the amount to pre-fetch before the live point
>>>>> would depend upon the media format. For an MP4/ISO BMFF file,
>> 120000
>>>>> could represent the random access point most immediately preceding
>>>>> the live point. It would be similar for an indexed MP2. And for
>>>>> unindexed
>>>>> MP2 representations, it's not uncommon for a client to prebuffer a
>>>>> fixed amount of content in the hopes of capturing a keyframe (really
>>>>> a heuristic).
>>>>>
>>>>> [cp] The HEAD request is necessary in this case to know where the
>>>>> live point is at the time the request is made so the HTTP client
>>>>> would know if it can jump into already-stored content or if it
>>>>> should just acquire the live point.
>>>>>
>>>>> [cp] The important point is that all common video formats need a
>>>>> discontinuity-free number of bytes before the live point to provide
>>>>> a quality user experience.
>>>>>
>>>>>
>>>>> How should the client know, which byte ranges are already available
>>>>> on the server? When the client is playing back from the recorded
>>>>> part and would like to skip 5min forward, how does the client know,
>>>>> whether a normal range request is needed or whether the client
>>>>> should as for the live point? What type of HTTP Status code should
>>>>> be provided, when the range request is not yet available of the server?
>>>>>
>>>>> [cp] We're not trying to come up with a universal solution for
>>>>> performing time-based seek on all media formats with this draft. So
>>>>> some of this is out of scope. But let me see if I can fill in some
>>>>> of the blanks.
>>>>>
>>>>> */[TL] Ok, not everything needs to be in-scope. But an essential
>>>>> assumption should be, whether the client has a time-to-byteoffset
>>>>> table or whether the client can determine precisely the fragment
>>>>> boundary positions. /*
>>>>>
>>>>> [cp] Optimally, time-to-byte indexes would be used. But even without
>>>>> this, clients can often manage with heuristics. e.g. VLC can perform
>>>>> a reasonable job of providing time-seek on unindexed MP2 files.
>>>>>
>>>>>
>>>>>
>>>>> [cp] Some applications of media streaming have time-based indexing
>>>>> facilities built-in. e.g. MP4 (ISO BMFF) containers allow time and
>>>>> data to be associated using the various internal, mandatory metadata
>>>>> "boxes". In other cases, applications may provide a separate
>>>>> resource that contains time-to-byte mappings (e.g. content index
>>>>> files). In either case, there's a facility for mapping time offsets
>>>>> to byte offsets - or sometimes the client incorporates heuristics to
>>>>> perform time skips (e.g. VLC will do this on some file formats).
>>>>>
>>>>>
>>>>> */[TL] Yes. fMP4 supports this and MPEG DASH is leveraging this. But
>>>>> the live-point is not described in the fragments. The client
>>>>> determines the livepoint from the manifest. /*
>>>>>
>>>>> [cp] Correct. In fragmented content, the time-to-segment map tells
>>>>> you which representation to fetch (via GET). While I'd say that
>>>>> bytes-live can also improve segmented rendering (by reducing the
>>>>> latency of rendering), the primary focus of our draft is for
>>>>> non-segmented representations.
>>>>>
>>>>>
>>>>> [cp] In all these cases, there's some mechanism that maps time
>>>>> offsets to byte offsets.
>>>>>
>>>>> */[TL] Yes/*
>>>>>
>>>>>
>>>>>
>>>>> [cp] When it comes to the available byte range, a client can know
>>>>> what data range is available by utilizing a HEAD request with a "Range:
>>>>> bytes=0-". The "Content-Range" response can contain something like
>>>>> "Content-Range: bytes 0-1234567/*" which tells the client both the
>>>>> current randomly accessible content range (via the "0-1234567") and
>>>>> that the content is of indeterminate length (via the "*").
>>>>>
>>>>> */[TL] So, that is the existing Content-Range response, but with an
>>>>> '*' to indicate the unknown content-length, correct? /*
>>>>>
>>>>> [cp] Yeah, the "*" in place of the last-byte-pos indicates an
>>>>> indetermine-length response body.
>>>>>
>>>>>
>>>>>
>>>>> [cp] Putting this all together, a client would implement a 5-minute
>>>>> skip by:
>>>>>       (1) Adding 5 minutes to your current play time,
>>>>>       (2) determining the byte offset for that given time using the
>>>>> appropriate index/heuristic (e.g. "3456789"),
>>>>>       (3) if the time is outside the index, jump to the live point
>>>>> and update the time to the last-index time or other means (e.g.
>>>>> using
>>>>> "Range: bytes-live=340000-*" to pre-buffer/pre-prime the
>>>>> frame/sample buffer),
>>>>>       (4) if the time is inside the index, either perform a standard
>>>>> bytes Range request to retrieve an implementation-specific quantum
>>>>> of time or data (e.g. "Range: bytes=3456789-3556789") and render.
>>>>>
>>>>> */[TL] In (2), How does the client determine the byte offset? fMP4
>>>>> requires precise byteoffset, In case of TS, the client can sync to
>>>>> the stream by first searching for 0x47 sync bytes. In (3), how does
>>>>> the client determine "outside of the index"? Seems that some sort of
>>>>> manifest is implicitly needed, which allows the client to understand
>>>>> the latest byte pos. /*
>>>>>
>>>>> [cp] (2) is media-format-specific. For MP4/ISO BMFF, it would use
>>>>> the built-in metadata, for MP2, it would either use an index file or
>>>>> a heuristic.
>>>>>
>>>>> [cp] For (3), if the current live point (in byte terms) is greater
>>>>> than the last byte offset in the index, then the live point is
>>>>> "outside the index". That is, the time the client is trying to
>>>>> access isn't randomly accessible, and the client should just jump to
>>>>> the live point.
>>>>>
>>>>> *//*
>>>>>
>>>>>
>>>>>
>>>>> [cp] Again, some of this is out of scope, but I hope that clarifies
>>>>> a common use case.
>>>>>
>>>>> */[TL] Would be good to clarify, what information the client needs
>>>>> to get in order to do the operations. How the client gets the info
>>>>> can be left out-of-scope./*
>>>>>
>>>>> [cp] ok - I hope I'm filling in more of the blanks...
>>>>>
>>>>>
>>>>>
>>>>> [cp] Regarding the status code, RFC7233 (section 4.4) indicates that
>>>>> code 416 (Range Not Satisfiable) must be returned when "the current
>>>>> extent of the selected resource or that the set of ranges requested
>>>>> has been rejected due to invalid ranges or an excessive request of
>>>>> small or overlapping ranges." This part of 4.4 applies to *all*
>>>>> Range requests - regardless of the Range Unit.
>>>>>
>>>>> */[TL] ok. /*
>>>>>
>>>>>
>>>>>
>>>>> [cp] The bytes-live draft then goes on to say that "A
>>>>> bytes-live-range-specifier is considered unsatisfiable if the
>>>>> first-byte-pos is larger than the current length of the
>>>>> representation". This could probably be elaborated on a bit. But
>>>>> this is supposed to be the "hook" into the 4.4 language.
>>>>>
>>>>>
>>>>> Can you please clarify the questions?
>>>>>
>>>>> [cp] I hope I succeeded (at least partially). Apologies for the long
>>>>> response. I wanted to make sure I was answering your questions.
>>>>>
>>>>> */[TL] Gets a bit clearer, but I still don't understand the "mimic
>>>>> HLS or DASH". DASH / HLS focuses on CDN optimization by creating a
>>>>> sequence of individual files. The client can work out the live-point
>>>>> URL from the manifest. Each segment is a "good" access point (in
>>>>> DASH always box boundaries and in HLS always TS boundaries even with
>>>>> PAT / PMT). So, the key issue here is to clarify, how the client
>>>>> gets the byte offsets of the fragment boundaries for range
>>>>> requests./*
>>>>>
>>>>> [cp] If it's still a bit unclear how this is performed, I can go
>>>>> into more detail. But like I say, I should really reword that
>>>>> section of the draft since I think I've created some confusion. The
>>>>> point I was trying to make was that *polling* a non-segmented
>>>>> representation would
>>>>> - other than being inefficient - have the kind of multi-second
>>>>> latency that segmented live streaming would have.
>>>>>
>>>>> [cp] But the difficulty of expressing this (secondary) benefit in
>>>>> the bytes-live is probably not worth the trouble. I'll see if I can
>>>>> reword the draft to make it less confusing. I don't think this point
>>>>> is necessary to "sell" the concept of bytes-live (or a
>>>>> bytes-live-like feature).
>>>>>
>>>>> [cp] BTW, if you're really interested in the details of mapping time
>>>>> to offsets in a ISO BMFF container, have a look at
>>>>> odid_mp4_parser.vala:get_random_access_points() and
>>>>> get_random_access_point_for_time() at
>>>>> https://github.com/cablelabs/rygel/tree/cablelabs/master/src/media-
>>>> engines/odid/.
>>>>> I can probable even get you instructions for printing RAPs for MP4
>>>>> files using the test program.
>>>>>
>>>>> hth - cp
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> BR,
>>>>>
>>>>> Thorsten
>>>>>
>>>>> *From:*Craig Pratt [mailto:craig@ecaspia.com]
>>>>> *Sent:* Monday, April 18, 2016 11:04 AM
>>>>> *To:* K.Morgan@iaea.org <mailto:K.Morgan@iaea.org>;
>>>> fielding@gbiv.com
>>>>> <mailto:fielding@gbiv.com>
>>>>> *Cc:* Göran Eriksson AP; bs7652@att.com <mailto:bs7652@att.com>;
>>>>> remy@lebeausoftware.org <mailto:remy@lebeausoftware.org>;
>>>>> ietf-http-wg@w3.org <mailto:ietf-http-wg@w3.org>;
>>>> rodger@plexapp.com
>>>>> <mailto:rodger@plexapp.com>; julian.reschke@gmx.de
>>>>> <mailto:julian.reschke@gmx.de>; C.Brunhuber@iaea.org
>>>>> <mailto:C.Brunhuber@iaea.org>
>>>>> *Subject:* Re: Issue with "bytes" Range Unit and live streaming
>>>>>
>>>>> On 4/18/16 12:34 AM, K.Morgan@iaea.org <mailto:K.Morgan@iaea.org>
>>>> wrote:
>>>>>       On Friday,15 April 2016 22:43,fielding@gbiv.com
>>>> <mailto:fielding@gbiv.com>  wrote:
>>>>>           Oh, never mind, now I see that you are referring to the
>>>>> second number being
>>>>>
>>>>>           fixed.
>>>>>
>>>>>
>>>>>
>>>>>           I think I would prefer that be solved by allowing
>>>>> last-byte-pos to be empty, just
>>>>>
>>>>>           like it is for the Range request.  I think such a fix is
>>>>> just as likely to be
>>>>>
>>>>>           interoperable as introducing a special range type (same failure
>> cases).
>>>>>
>>>>>
>>>>>           ....Roy
>>>>>
>>>>>
>>>>>
>>>>>       +1000
>>>>>
>>>>>
>>>>>
>>>>>       A very similar idea was proposed before [1] as an I-D [2] by
>>>>> Rodger
>>>> Coombs. We've also brought this up informally with other members of
>>>> the WG.
>>>>>
>>>>>       Alas, in our experience range requests don't seem to be a high
>>>>> priority :(
>>>> For example, the problem of combining gzip with range requests is
>>>> still unsolved [3].
>>>>>
>>>>>
>>>>> [1]https://lists.w3.org/Archives/Public/ietf-http-wg/2015AprJun/0122
>>>>> .h
>>>>> tml
>>>>>
>>>>>
>>>>> [2]https://tools.ietf.org/html/draft-combs-http-indeterminate-range-
>>>>> 01
>>>>>
>>>>>
>>>>> [3]https://lists.w3.org/Archives/Public/ietf-http-wg/2014AprJun/1327
>>>>> .h
>>>>> tml
>>>>>
>>>>> [cp] Yeah, it's unfortunate that no solutions have moved forward for
>>>>> this widely-desired feature. I can only assume that people just
>>>>> started defining proprietary solutions - which is unfortunate. I'll
>>>>> try to be "persistent"... ;^J
>>>>>
>>>>> [cp] As was mentioned, the issue with just arbitrarily allowing an
>>>>> open-ended Content-Range response (omitting last-byte-pos) is that
>>>>> there's no good way for a client to indicate it can support
>>>>> reception of a Content-Range without a last-byte-pos. So I would
>>>>> fully expect many clients to fail in "unpredictable ways"
>>>>> (disconnecting, crashing, etc).
>>>>>
>>>>> [cp] I see that the indeterminate length proposal you referenced in
>>>>> your first citation introduces a "Accept-Indefinite-Ranges" header
>>>>> to prevent this issue. But I think this brings with it some other
>>>>> questions. e.g. Would this apply to any/all Range Units which may be
>>>>> introduced in the future? How can a Client issue a request that
>>>>> starts at the "live point"? It feels like it has one hand tied behind its back.
>>>>>
>>>>> [cp] If I could, I would prefer to go back in time and advocate for
>>>>> an alternate ABNF for the bytes Range Unit. Seeing as that's not an
>>>>> option, I think using this well- and long-defined Range Unit
>>>>> extension mechanism seems like a good path forward as it should not
>>>>> create interoperability issues between clients and servers.
>>>>>
>>>>> [cp] And I would hope adding a Range Unit would have a low/lower bar
>>>>> for acceptance. e.g. If a Range Unit fills a useful role, is
>>>>> well-defined, and isn't redundant, it seems reasonable that it
>>>>> should be accepted as it shouldn't impact existing HTTP/1.1
>>>>> semantics. In fact, the gzip case (referenced in your third
>>>>> citation) seems like a perfect application of the Range Unit (better
>>>>> than bytes-live). If there's interest, I'll write up an RFC to demonstrate...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> This email message is intended only for the use of the named recipient.
>>>> Information contained in this email message and its attachments may
>>>> be privileged, confidential and protected from disclosure. If you are
>>>> not the intended recipient, please do not read, copy, use or disclose
>>>> this communication to others. Also please notify the sender by
>>>> replying to this message and then delete it from your system.
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>> craig pratt
>>>>>
>>>>> Caspia Consulting
>>>>>
>>>>> craig@ecaspia.com <mailto:craig@ecaspia.com>
>>>>>
>>>>> 503.746.8008
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> craig pratt
>>>>>
>>>>> Caspia Consulting
>>>>>
>>>>> craig@ecaspia.com <mailto:craig@ecaspia.com>
>>>>>
>>>>> 503.746.8008
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> craig pratt
>>>>>
>>>>> Caspia Consulting
>>>>>
>>>>> craig@ecaspia.com <mailto:craig@ecaspia.com>
>>>>>
>>>>> 503.746.8008
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>>
>>>> craig pratt
>>>>
>>>> Caspia Consulting
>>>>
>>>> craig@ecaspia.com
>>>>
>>>> 503.746.8008
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> --
>>
>> craig pratt
>>
>> Caspia Consulting
>>
>> craig@ecaspia.com
>>
>> 503.746.8008
>>
>>
>>
>>
>>
>>
>>


-- 

craig pratt

Caspia Consulting

craig@ecaspia.com

503.746.8008
Received on Wednesday, 20 April 2016 19:58:00 UTC