[mediacapture-record] More information about timing of recording (start event, blob event, stop event) (#208)

btsimonh has just created a new issue for https://github.com/w3c/mediacapture-record:

== More information about timing of recording (start event, blob event, stop event) ==
The current spec and implementations don't allow a developer to establish the timing of the recording produced in any way accurately.

I've been struggling for over 2 weeks to empirically (and by examining blink source) determine within +-10ms when a recording began (i.e. the time of the first sample in the first buffer).
So far the best I can do is to take a timestamp immediately before getuserMedia, and to start the recording in the getusermedia callback.  This is less than ideal, and I have to guarantee that the user has already enabled the mic.

The issue I face (mainly investigated in Chrome) is that the timecode in the blob event somehow relates to some real time, but the data supplied in the blob contains data from BEFORE the media recorder is told to start (usually 140ms worth).

This issue:
https://github.com/w3c/mediacapture-record/issues/140
describes problems with the 'timecode' in the blob event highlighted in 2017.

I believe enhancing the MediaRecorder spec to allow developers to have more timing information about the recordings they are doing will benefit the web community in general, and provision of guidance and sample usage would allow these enhancements to be used effectively.


I would suggest:
1/ The start event contains a well specified unambiguous timestamp indicating the time of the first sample of data recorded, using some clock which can be easily read in javascript.
e.g. 'the timecode value in the start event will be set to the time that the first recorded sample was received from the input device.  
The time since the first sample was received (or time UNTIL the first sample will be received) could be calculated simply with:
let firstSampleLatency = (perfomance.now + performance.timeOrigin) - startevent.timecode;'.
Implying by example that the timecode is the value of performance.now plus performance.timeOrigin at the moment that sample arrived.

2/ The blobevent contains a timecode which is better specified.
the spec states that the blobevent timecode is:
"The difference between the timestamp of the first chunk in data and the timestamp of the first chunk in the first BlobEvent produced by this recorder as a DOMHighResTimeStamp [HR-TIME]. Note that the timecode in the first produced BlobEvent does not need to be zero."

Under what circumstances would the "timecode in the first produced BlobEvent" not be zero?
If this intended to mean that the timecode COULD be a real time timestamp, then maybe it should just be defined as such?
e.g. 'the timecode will represent the time that the first sample in the blob was received by the input device.
The time since the first sample in the blob received can be calculated:
let sampleBlobLatency = (perfomance.now + performance.timeOrigin) - blobevent.timecode;'.
Implying by example that the timecode is the value of performance.now plus performance.timeOrigin at the moment the first sample of the blob arrived.

3/ The blobevent contains a duration which is well specified.
I believe adding a duration to the blobevent would be useful.  As stated in the linked issue, you can't currently get this without decoding the blob.

4/ If modifying the start event to add a timecode, then the stop event may as well contain one?
e.g. 'the timecode value in the stop event will be set to the time that the (last recorded sample +1) was received from the input device.
The time since the lastplusone sample was received can be calculated:
let stopLatency = (perfomance.now + performance.timeOrigin) - stopevent.timecode;'.
The overall duration of the recording could be calculated simply with:
let duration = stopevent.timecode - startevent.timecode;



Issues to consider:
1/ When recording multiple streams (e.g. audio and video), the time of the first sample in a blob may not be the same for each stream, according to encoding & muxing.  e.g. maybe we're getting an AV recording with blobs every 10ms, so maybe only every 4th blob contains video data?  Which stream's samples do you use for timecode?
I would suggest that if audio is present, the timecode is driven by the first audio, else by the first of the 'other' streams.
2/ when the stream is not from an input device, what is considered to be the time 'was received from the input device'?
3/ I mention (perfomance.now + performance.timeOrigin) only because it MAY be close to a current implementation in blink (I am NOT an expert, and this info is mainly empirical).  I am also not an expert in the performance api.  Other clocks may be more appropriate.  If using performance.now as a basis, it may be better to specify WITHOUT performance.timeOrigin if only to make it plain to observers of the values that they are NOT directly related to Date(), else we will see user implementations not taking into account drift between Date() and (perfomance.now + performance.timeOrigin) - best to avoid confusion.



Background:
My application records audio with reference to a playing reference media (think voiceover).  So I must relate the time of the recording to the playing media time.
The fact that the mediarecorder does not necessarily start right away in a loaded system, yet is completely able to record every input sample under such circumstance meant that I needed to calculate when the recording started in order to line up the recordings with the original media.  I found the API and implementation in Chrome incredibly frustrating as timecode appeared to offer the ability to make this calculation, but in practice it does not.  Additionally, I found that the MediaRecorder in Chrome is totally capable of delivering samples from the inputs which arrived well before the moment I ask the recording to start, with no way of knowing how much data is sitting in a buffer ready for MediaRecorder to suck out when asked to start.
Reading the spec in more detail brings me to think that it's not necessarily an implementation 'bug' on Chrome's part.  I've not seen any spec guidance of how a media recording should start (i.e. whether it should use pre-buffered data - I can see advantages to this for some applications).  Also, the blobevent timecode as specified is very ambiguous, and lacks solid examples of use to guide implementers and developers in the MediaRecorder use case.




Please view or discuss this issue at https://github.com/w3c/mediacapture-record/issues/208 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 18 November 2020 09:16:17 UTC