Re: Raw data APIs - 2 - Access to unencoded data from Harald Alvestrand on 2018-05-30 (public-webrtc@w3.org from May 2018)

From: Harald Alvestrand <harald@alvestrand.no>
Date: Wed, 30 May 2018 14:10:15 +0200
To: "Stojiljkovic, Aleksandar" <aleksandar.stojiljkovic@intel.com>, "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <d00ed522-1bb6-214f-db97-c684b37091a4@alvestrand.no>
Den 30. mai 2018 13:42, skrev Stojiljkovic, Aleksandar:
>> Interface Buffer ... Long long timestamp; // for start of buffer. Definition TBD.
> 
> Maybe DOMHighResTimeStamp
> <https://www.w3.org/TR/hr-time/#sec-domhighrestimestamp>, to enable sync
> with other events, e.g. Sensor interface
> <https://www.w3.org/TR/generic-sensor/#the-sensor-interface>.

DOMHighResTimeStamp is convenient for absolute times, when it's clear
what it's related to (time the light hit the camera sensor?). For
playback of stored media, we might want to use a clock relative to the
start of the media; I know there are various well known apps for doing
things like time-stretching of videos - I'm not sure how that would be
representable in an API.

> 
> 
>> Enum VideoBufferFormat {  // alternate - separate enums for audio and video
>   “i420-video”,
>   “i444-video”,  “yuv-video”,
> }
> Suggestions:
> - add enums for other formats, e.g. YUY2, UYVY, MJPG...,
> - add y16-video for 16-bit single plane infrared/D16 depth capture.
> 
> Why not using fourcc codes?

Fourcc codes are probably a good idea. Do you have a good reference
definition for them?


> 
> Kind Regards,
> Aleksandar
> 
> 
> ------------------------------------------------------------------------
> *From:* Harald Alvestrand [harald@alvestrand.no]
> *Sent:* Tuesday, May 29, 2018 9:29 AM
> *To:* public-webrtc@w3.org
> *Subject:* Raw data APIs - 2 - Access to unencoded data
> 
> 
> *Proposal for Raw Data*
> Extend MediaStreamTrack.
> Let the following execute:
> 
> track = new MediaStreamTrack();
> track.injectData(buffer).then((buffer) => recycleBuffer);
> track.extractData(buffer).then((buffer) => processBuffer);
> 
> The buffers consumed and produced should be modeled after the C++ API
> for injecting frames (for video) or samples (for audio).
> For integration with other systems (in particular WebAssembly), it’s
> important that buffers be provided by the application - this allows
> copying to be minimized without risking security.
> It’s also important that the ownership of buffers is well defined - that
> we have clear demarcation points where buffers are owned by the
> MediaStreamTrack and its customers, and when they are owned by the
> application for refilling. A promise-based interface seems good for this
> type of operation.
> Alternatively, a Streams-based interface can be used, such as the one
> _proposed earlier_
> <https://github.com/yellowdoge/streams-mediastreamtrack>. This would
> still be defined in terms of the buffer construct below, but would use
> Streams, and therefore require a data copy.
> 
> Partial interface MediaStreamTrack {
> promise<Buffer> insertData(buffer);  // will fail if track is connected
> to a source
>           promise<Buffer> extractData(buffer);  // will fail if track is
> connected to a sink
> }
> 
> The buffers should be structures, not raw buffers. For instance:
> 
> Interface Buffer {
>     DOMString kind;  // video, audio, encoded-video, encoded-audio
>     Long long timestamp; // for start of buffer. Definition TBD.
>     BufferFormat format;
>     ByteArray buffer;  // raw bytes, to be cast into appropriate format.
>                                  // need to study WebAsm linkages to get
> this processing efficient.
> }
> 
> Interface AudioDataBuffer: Buffer {
>     // The name AudioBuffer is already used by WebAudio. This buffer has
> the same
>     // properties, but can be used with multiple audio data formats.
>     AudioBufferFormat format;
>     float? sampleRate; // only valid for audio
>     int? Channels;  // only valid for audio
> }
> 
> Enum AudioBufferFormat {
>   “l16-audio”,  // 16 bit integer samples
>   “f32-audio”,  // 32-bit floating point samples
> }
> 
> Interface VideoBuffer : Buffer {
>    VideoBufferFormat format;
>    Int width;
>    Int height;
>    DOMString rotation;
> }
> 
> Enum VideoBufferFormat {  // alternate - separate enums for audio and video
>   “i420-video”,
>   “i444-video”,  “yuv-video”,
> }
> 
> One important aspect of such an interface is what happens if congestion
> happens - if insertFrame() is called more frequently than the downstream
> can consume, or if extractData() is called on a cadence that is slower
> than once every frame produced at source.
> In both these cases, for the raw image API, I think it is reasonable to
> just drop frames. The consumer needs to be able to look at the
> timestamps of the produced frames and do the Right Thing - raw frames
> have no interdependencies, so dropping frames is OK.
> 
>  
> 
> -- 
> Surveillance is pervasive. Go Dark.
> 
> ---------------------------------------------------------------------
> Intel Finland Oy
> Registered Address: PL 281, 00181 Helsinki
> Business Identity Code: 0357606 - 4
> Domiciled in Helsinki
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
Received on Wednesday, 30 May 2018 12:10:51 UTC