RE: Issue with "bytes" Range Unit and live streaming from Thorsten Lohmar on 2016-04-19 (ietf-http-wg@w3.org from April to June 2016)

From: Thorsten Lohmar <thorsten.lohmar@ericsson.com>
Date: Tue, 19 Apr 2016 14:02:14 +0000
To: Craig Pratt <craig@ecaspia.com>, "K.Morgan@iaea.org" <K.Morgan@iaea.org>, "fielding@gbiv.com" <fielding@gbiv.com>
CC: Göran Eriksson AP <goran.ap.eriksson@ericsson.com>, "bs7652@att.com" <bs7652@att.com>, "remy@lebeausoftware.org" <remy@lebeausoftware.org>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>, "rodger@plexapp.com" <rodger@plexapp.com>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "C.Brunhuber@iaea.org" <C.Brunhuber@iaea.org>, Darshak Thakore <d.thakore@cablelabs.com>
Message-ID: <9E953B010F1E974399030905C5DCB2E7183D33A3@ESESSMB101.ericsson.se>
Hi Craig,

Thanks for sharing the github link. That certainly clarifies the use-case even further.

Maybe we should focus the discussion on the fMP4 format for some time, since tune-in into fMP4 requires random access to fragment boundaries. Compared to ts or mp3, fMP4 does not support synchronization to the stream from any received byte. The client must start processing an fMP4 stream from fragment boundaries (or box boundaries). Tuning to selected samples inside of the fMP4 file happens at a later stage.

So, when I understand your proposal around the byte-live request correctly, then the client is NOT asking for the precise byte range, but the next possible random access points in close proximity of the requested range. So, the client sends a request containing "Range: bytes-live=0-*" and gets a response with "Content-Range: bytes-live 123456-*": So, the server is providing the client with a HTTP resource, starting in case of fMP4 with the fragment boundary (which is at byte-offset 123456 of the resource).

Do I understand it correctly then: When the client wants to tune-in e.g. 5min in the TSB, then the client measures the bitrate of the media stream and calculates a rough byte offset (i.e. 5min x estimated bitrate, lets say byte pos 654321) [all out of scope of the ID] and creates a bytes-live range request of form "Range: bytes-live=654321-*". The server looksup a good fragment boundary (lets say 654420)  and responds with "Content-Range: bytes-live: 654420-*". Do I understand the proposal correctly?

If yes, should the solution be limited to live cases?  If I understand it correctly, then you are looking for a solution where the client indicates a rough byte range in the request and the server responds with another range, which is fulfilling some condition. In case of a live session with fMP4, the server looks for a random access point into the stream. The random access point must be a fragment boundary in case of fMP4 and can be PAT, PMT, PES in case of TS.

Nothing inline.

BR,
Thorsten


From: Craig Pratt [mailto:craig@ecaspia.com]
Sent: Tuesday, April 19, 2016 11:55 AM
To: Thorsten Lohmar; K.Morgan@iaea.org; fielding@gbiv.com
Cc: Göran Eriksson AP; bs7652@att.com; remy@lebeausoftware.org; ietf-http-wg@w3.org; rodger@plexapp.com; julian.reschke@gmx.de; C.Brunhuber@iaea.org; Darshak Thakore
Subject: Re: Issue with "bytes" Range Unit and live streaming

Hey Thorsten,

I'll try to reply in-line.

cp

On 4/18/16 3:50 PM, Thorsten Lohmar wrote:
Hi Craig, all,

Thanks for the clarification. Some further question inline

BR,
Thorsten

From: Craig Pratt [mailto:craig@ecaspia.com]
Sent: Monday, April 18, 2016 10:29 PM
To: Thorsten Lohmar; K.Morgan@iaea.org<mailto:K.Morgan@iaea.org>; fielding@gbiv.com<mailto:fielding@gbiv.com>
Cc: Göran Eriksson AP; bs7652@att.com<mailto:bs7652@att.com>; remy@lebeausoftware.org<mailto:remy@lebeausoftware.org>; ietf-http-wg@w3.org<mailto:ietf-http-wg@w3.org>; rodger@plexapp.com<mailto:rodger@plexapp.com>; julian.reschke@gmx.de<mailto:julian.reschke@gmx.de>; C.Brunhuber@iaea.org<mailto:C.Brunhuber@iaea.org>; Darshak Thakore; STARK, BARBARA H
Subject: Re: Issue with "bytes" Range Unit and live streaming

[cc-ing the co-authors]

Hi Thorsten,

I'm happy to help provide whatever answers I can.

Reply in-line.

cp

On 4/18/16 8:10 AM, Thorsten Lohmar wrote:
Hi Craig, all,

My colleague Göran asked me some question around the problem and I would like to raise these questions directly to you. Of course, there are some alternative solutions available, where the client can work out the different things from a manifest. But you seem to look for a simple solution, which works with non-segmented media on a single HTTP session.

When I understood it correctly, an HTTP server is making a live stream available using HTTP. A normal live stream can be opened with a single HTTP request and the server can serve data "from the live point" either with or without HTTP chunked delivery. The server cannot give a Content-Length, since this is an ongoing live stream of unknown size.
[cp] all correct.



Your use-case seem to be about recording of content. Client should access content from the recorded part, but should be able to jump to the live-point. I assume that you are not looking into sliding window recordings (i.e. timeshift). I assume that the a single program is continuous recording and the HTTP object is growing until the end of the live session, correct?
[cp] I didn't spell it out in the draft, but I would like to consider adding clarifications for the time-shift cases. This should just be a matter of a Client requesting one thing and getting another. e.g. "Range: bytes-live=0-*" results in "Content-Range: bytes-live 123456-*". In either case, you're correct: the end of the representation is moving forward in time until the end of the live session.


[TL] The "Range: bytes-live=0-*" case is not clear to me. Your ID says "All bytes currently in the representation and those appended to the end of the representation after the request is processed". I get the impression, that the server is deleting all data before a certain timepoint (aka, behavior of a slighting window timeshift).  So, the client seems to request all data from the beginning of the timeshift buffer. Why does the server need to change the byte offset from 0 to 123456?
I can understand, that the server must signal "growing resource" in the response.
[cp] I was trying to illustrate a case where the server had trimmed off bytes 0-123456 (the TSB model). So in this case, it's signalling to the client "you're getting bytes starting at 123456 (not 0)". e.g. If a client requests "Range: bytes-live=0-*" on an in-progress recording, one might expect:

    Content-Range: bytes-live 0-*/*

[cp] Basically saying (as described in the ID) that all bytes currently in the representation and those appended to the end of the representation after the request is processed will be returned. But on a TSB, one might expect:

    Content-Range: bytes-live 123456-*/*

[cp] Basically saying that bytes starting at byte 123456 in the representation and those appended to the end of the representation after the request is processed will be returned.

[cp] While I'm thinking about TSB use cases in the back of my mind, this is really not the primary use case I was considering for the ID (but I would hope it can be covered).

In any case, how does the client know "good" byte range offsets (i.e. service access points) to tune into the recording? Or is the assumption, that the client can synchronize to the media stream from any byte range?
[cp] For byte-level access, random access implementation is up to the client. For some containers this is easier than others. e.g. For MP4, the random access points can be established by reading the movie and fragment header(s). For something like MP2, it's trickier of course.
[TL] Well, in case of fMP4, the client needs to get the Movie Header for initialization. Then, proper access point are fragment boundaries. There are various ways to signal time to byte-offsets.
[cp] Fragments can actually have multiple access points - implicit (per sample) and explicit (random access points). But yeah, it seems common for fragments to have one random access point (and often correlate to a GOP) - and that there's a huge variety of ways to lay out the samples.

In case of TS, the client needs a PAT, PMT and PES starts for tune-in. It is a bit more tricky, but also here are solutions.
But the email talks about "none-segmented" media. The draft talks about "mimicking segmented media". fMP4 is actually the way to create ISO-BMFF segments. So, it is for segmented media, but without a separate manifest?
[cp] It's important to differentiate between *fragmented* and *segmented* MP4/ISO BMFF representations. bytes-live is most applicable to fragmented files - where you have one representation being used for the entire streaming session - with this representation being appended to periodically (usually one fragment at a time).

[cp] I really need to revise my description in the draft to help avoid confusion. What I was trying to describe was how a solution using just byte-Range requests would always be slightly behind the live point - as is the case with rendering "live" segmented streams. While bytes-live could be used for fragmented (MP4/ISO BMFF) content or segmented content, the primary use case is for non-segmented representations.



[cp] One major feature this draft allows is for retrieval of bytes just preceding the live point. So for example, a client can do a Range head request like "Range: bytes=0-", get a "Content-Range: bytes 0-1234567/*", then perform something like a "Range: bytes-live=1200000-*", and prime its framebuffer with 34567 bytes of data that precede the live point - allowing for the client to find an access point (e.g. mpeg2 start codes) and to allow live presentation to display much sooner than it would from the live point (without random access).
[TL] So, how does the client know, that the proper fragment boundary is at byte position 120000? Do you assume that the client first fetches a time-to-byte offset file, which tells the client that a access point (e.g. a fragment boundary) is at byte pos 120000? If yes, why does the client need the HEAD request, when it already has the byte position?
[cp] How a client know the amount to pre-fetch before the live point would depend upon the media format. For an MP4/ISO BMFF file, 120000 could represent the random access point most immediately preceding the live point. It would be similar for an indexed MP2. And for unindexed MP2 representations, it's not uncommon for a client to prebuffer a fixed amount of content in the hopes of capturing a keyframe (really a heuristic).

[cp] The HEAD request is necessary in this case to know where the live point is at the time the request is made so the HTTP client would know if it can jump into already-stored content or if it should just acquire the live point.

[cp] The important point is that all common video formats need a discontinuity-free number of bytes before the live point to provide a quality user experience.



How should the client know, which byte ranges are already available on the server? When the client is playing back from the recorded part and would like to skip 5min forward, how does the client know, whether a normal range request is needed or whether the client should as for the live point? What type of HTTP Status code should be provided, when the range request is not yet available of the server?
[cp] We're not trying to come up with a universal solution for performing time-based seek on all media formats with this draft. So some of this is out of scope. But let me see if I can fill in some of the blanks.
[TL] Ok, not everything needs to be in-scope. But an essential assumption should be, whether the client has a time-to-byteoffset table or whether the client can determine precisely the fragment boundary positions.
[cp] Optimally, time-to-byte indexes would be used. But even without this, clients can often manage with heuristics. e.g. VLC can perform a reasonable job of providing time-seek on unindexed MP2 files.



[cp] Some applications of media streaming have time-based indexing facilities built-in. e.g. MP4 (ISO BMFF) containers allow time and data to be associated using the various internal, mandatory metadata "boxes". In other cases, applications may provide a separate resource that contains time-to-byte mappings (e.g. content index files). In either case, there's a facility for mapping time offsets to byte offsets - or sometimes the client incorporates heuristics to perform time skips (e.g. VLC will do this on some file formats).


[TL] Yes. fMP4 supports this and MPEG DASH is leveraging this. But the live-point is not described in the fragments. The client determines the livepoint from the manifest.
[cp] Correct. In fragmented content, the time-to-segment map tells you which representation to fetch (via GET). While I'd say that bytes-live can also improve segmented rendering (by reducing the latency of rendering), the primary focus of our draft is for non-segmented representations.


[cp] In all these cases, there's some mechanism that maps time offsets to byte offsets.
[TL] Yes


[cp] When it comes to the available byte range, a client can know what data range is available by utilizing a HEAD request with a "Range: bytes=0-". The "Content-Range" response can contain something like "Content-Range: bytes 0-1234567/*" which tells the client both the current randomly accessible content range (via the "0-1234567") and that the content is of indeterminate length (via the "*").
[TL] So, that is the existing Content-Range response, but with an '*' to indicate the unknown content-length, correct?
[cp] Yeah, the "*" in place of the last-byte-pos indicates an indetermine-length response body.



[cp] Putting this all together, a client would implement a 5-minute skip by:
    (1) Adding 5 minutes to your current play time,
    (2) determining the byte offset for that given time using the appropriate index/heuristic (e.g. "3456789"),
    (3) if the time is outside the index, jump to the live point and update the time to the last-index time or other means (e.g. using "Range: bytes-live=340000-*" to pre-buffer/pre-prime the frame/sample buffer),
    (4) if the time is inside the index, either perform a standard bytes Range request to retrieve an implementation-specific quantum of time or data (e.g. "Range: bytes=3456789-3556789") and render.
[TL] In (2), How does the client determine the byte offset? fMP4 requires precise byteoffset, In case of TS, the client can sync to the stream by first searching for 0x47 sync bytes. In (3), how does the client determine "outside of the index"? Seems that some sort of manifest is implicitly needed, which allows the client to understand the latest byte pos.
[cp] (2) is media-format-specific. For MP4/ISO BMFF, it would use the built-in metadata, for MP2, it would either use an index file or a heuristic.

[cp] For (3), if the current live point (in byte terms) is greater than the last byte offset in the index, then the live point is "outside the index". That is, the time the client is trying to access isn't randomly accessible, and the client should just jump to the live point.




[cp] Again, some of this is out of scope, but I hope that clarifies a common use case.
[TL] Would be good to clarify, what information the client needs to get in order to do the operations. How the client gets the info can be left out-of-scope.
[cp] ok - I hope I'm filling in more of the blanks...



[cp] Regarding the status code, RFC7233 (section 4.4) indicates that code 416 (Range Not Satisfiable) must be returned when "the current extent of the selected resource or that the set of ranges requested has been rejected due to invalid ranges or an excessive request of small or overlapping ranges." This part of 4.4 applies to *all* Range requests - regardless of the Range Unit.
[TL] ok.


[cp] The bytes-live draft then goes on to say that "A bytes-live-range-specifier is considered unsatisfiable if the first-byte-pos is larger than the current length of the representation". This could probably be elaborated on a bit. But this is supposed to be the "hook" into the 4.4 language.



Can you please clarify the questions?
[cp] I hope I succeeded (at least partially). Apologies for the long response. I wanted to make sure I was answering your questions.
[TL] Gets a bit clearer, but I still don't understand the "mimic HLS or DASH". DASH / HLS focuses on CDN optimization by creating a sequence of individual files. The client can work out the live-point URL from the manifest. Each segment is a "good" access point (in DASH always box boundaries and in HLS always TS boundaries even with PAT / PMT). So, the key issue here is to clarify, how the client gets the byte offsets of the fragment boundaries for range requests.
[cp] If it's still a bit unclear how this is performed, I can go into more detail. But like I say, I should really reword that section of the draft since I think I've created some confusion. The point I was trying to make was that *polling* a non-segmented representation would - other than being inefficient - have the kind of multi-second latency that segmented live streaming would have.

[cp] But the difficulty of expressing this (secondary) benefit in the bytes-live is probably not worth the trouble. I'll see if I can reword the draft to make it less confusing. I don't think this point is necessary to "sell" the concept of bytes-live (or a bytes-live-like feature).

[cp] BTW, if you're really interested in the details of mapping time to offsets in a ISO BMFF container, have a look at odid_mp4_parser.vala:get_random_access_points() and get_random_access_point_for_time() at https://github.com/cablelabs/rygel/tree/cablelabs/master/src/media-engines/odid/. I can probable even get you instructions for printing RAPs for MP4 files using the test program.

hth - cp







BR,
Thorsten



From: Craig Pratt [mailto:craig@ecaspia.com]
Sent: Monday, April 18, 2016 11:04 AM
To: K.Morgan@iaea.org<mailto:K.Morgan@iaea.org>; fielding@gbiv.com<mailto:fielding@gbiv.com>
Cc: Göran Eriksson AP; bs7652@att.com<mailto:bs7652@att.com>; remy@lebeausoftware.org<mailto:remy@lebeausoftware.org>; ietf-http-wg@w3.org<mailto:ietf-http-wg@w3.org>; rodger@plexapp.com<mailto:rodger@plexapp.com>; julian.reschke@gmx.de<mailto:julian.reschke@gmx.de>; C.Brunhuber@iaea.org<mailto:C.Brunhuber@iaea.org>
Subject: Re: Issue with "bytes" Range Unit and live streaming

On 4/18/16 12:34 AM, K.Morgan@iaea.org<mailto:K.Morgan@iaea.org> wrote:

On Friday,15 April 2016 22:43, fielding@gbiv.com<mailto:fielding@gbiv.com> wrote:

Oh, never mind, now I see that you are referring to the second number being

fixed.



I think I would prefer that be solved by allowing last-byte-pos to be empty, just

like it is for the Range request.  I think such a fix is just as likely to be

interoperable as introducing a special range type (same failure cases).



....Roy



+1000



A very similar idea was proposed before [1] as an I-D [2] by Rodger Coombs. We've also brought this up informally with other members of the WG.



Alas, in our experience range requests don't seem to be a high priority :(  For example, the problem of combining gzip with range requests is still unsolved [3].



[1] https://lists.w3.org/Archives/Public/ietf-http-wg/2015AprJun/0122.html

[2] https://tools.ietf.org/html/draft-combs-http-indeterminate-range-01

[3] https://lists.w3.org/Archives/Public/ietf-http-wg/2014AprJun/1327.html
[cp] Yeah, it's unfortunate that no solutions have moved forward for this widely-desired feature. I can only assume that people just started defining proprietary solutions - which is unfortunate. I'll try to be "persistent"... ;^J

[cp] As was mentioned, the issue with just arbitrarily allowing an open-ended Content-Range response (omitting last-byte-pos) is that there's no good way for a client to indicate it can support reception of a Content-Range without a last-byte-pos. So I would fully expect many clients to fail in "unpredictable ways" (disconnecting, crashing, etc).

[cp] I see that the indeterminate length proposal you referenced in your first citation introduces a "Accept-Indefinite-Ranges" header to prevent this issue. But I think this brings with it some other questions. e.g. Would this apply to any/all Range Units which may be introduced in the future? How can a Client issue a request that starts at the "live point"? It feels like it has one hand tied behind its back.

[cp] If I could, I would prefer to go back in time and advocate for an alternate ABNF for the bytes Range Unit. Seeing as that's not an option, I think using this well- and long-defined Range Unit extension mechanism seems like a good path forward as it should not create interoperability issues between clients and servers.

[cp] And I would hope adding a Range Unit would have a low/lower bar for acceptance. e.g. If a Range Unit fills a useful role, is well-defined, and isn't redundant, it seems reasonable that it should be accepted as it shouldn't impact existing HTTP/1.1 semantics. In fact, the gzip case (referenced in your third citation) seems like a perfect application of the Range Unit (better than bytes-live). If there's interest, I'll write up an RFC to demonstrate...








This email message is intended only for the use of the named recipient. Information contained in this email message and its attachments may be privileged, confidential and protected from disclosure. If you are not the intended recipient, please do not read, copy, use or disclose this communication to others. Also please notify the sender by replying to this message and then delete it from your system.

--




craig pratt

Caspia Consulting

craig@ecaspia.com<mailto:craig@ecaspia.com>

503.746.8008




--



craig pratt

Caspia Consulting

craig@ecaspia.com<mailto:craig@ecaspia.com>

503.746.8008




--


craig pratt

Caspia Consulting

craig@ecaspia.com<mailto:craig@ecaspia.com>

503.746.8008
Received on Tuesday, 19 April 2016 14:03:50 UTC