W3C home > Mailing lists > Public > public-webrtc@w3.org > June 2018

RE: Raw data APIs - 2 - Access to unencoded data

From: Stojiljkovic, Aleksandar <aleksandar.stojiljkovic@intel.com>
Date: Fri, 1 Jun 2018 20:18:48 +0000
To: Harald Alvestrand <harald@alvestrand.no>, "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <7A0FAB90EDAE304EA98E76ACB132D51233255821@IRSMSX103.ger.corp.intel.com>
Real-time timestamp is covered in separate email [1].

> Fourcc codes are probably a good idea. Do you have a good reference
> definition for them?

FourCC: uint32_t, defined in e.g. videodev2.h [2] as: 

/*  Four-character-code (FOURCC) */
#define v4l2_fourcc(a, b, c, d)\
((__u32)(a) | ((__u32)(b) << 8) | ((__u32)(c) << 16) | ((__u32)(d) << 24))

Video:
File [2] includes a list of video FourCC-s.

Audio:
It is 16 bit value used, so sometimes referred to as TwoCC.
Note that there are also video FourCC codes that fit 16 bit - so, we can also refer to audio code as FourCC.
The list of Audio FourCCs [3].

Note that it is possible that the same 16-bit value is used for a video FourCC and for Audio FourCC. I don't see that being a problem.

[1]
https://lists.w3.org/Archives/Public/public-webrtc/2018May/0141.html

[2] Video FOURCCs:
https://github.com/torvalds/linux/blob/ef1c4a6fa91bbbe9b09f770d28eba31a9edf770c/include/uapi/linux/videodev2.h#L81

[3] Audio FourCCs (Format Tags):
https://msdn.microsoft.com/en-us/library/windows/desktop/aa372553(v=vs.85).aspx

________________________________________
From: Harald Alvestrand [harald@alvestrand.no]
Sent: Wednesday, May 30, 2018 3:10 PM
To: Stojiljkovic, Aleksandar; public-webrtc@w3.org
Subject: Re: Raw data APIs - 2 - Access to unencoded data

Den 30. mai 2018 13:42, skrev Stojiljkovic, Aleksandar:
>> Interface Buffer ... Long long timestamp; // for start of buffer. Definition TBD.
>
> Maybe DOMHighResTimeStamp
> <https://www.w3.org/TR/hr-time/#sec-domhighrestimestamp>, to enable sync
> with other events, e.g. Sensor interface
> <https://www.w3.org/TR/generic-sensor/#the-sensor-interface>.

DOMHighResTimeStamp is convenient for absolute times, when it's clear
what it's related to (time the light hit the camera sensor?). For
playback of stored media, we might want to use a clock relative to the
start of the media; I know there are various well known apps for doing
things like time-stretching of videos - I'm not sure how that would be
representable in an API.

>
>
>> Enum VideoBufferFormat {  // alternate - separate enums for audio and video
>   “i420-video”,
>   “i444-video”,  “yuv-video”,
> }
> Suggestions:
> - add enums for other formats, e.g. YUY2, UYVY, MJPG...,
> - add y16-video for 16-bit single plane infrared/D16 depth capture.
>
> Why not using fourcc codes?

Fourcc codes are probably a good idea. Do you have a good reference
definition for them?


>
> Kind Regards,
> Aleksandar
>
>
> ------------------------------------------------------------------------
> *From:* Harald Alvestrand [harald@alvestrand.no]
> *Sent:* Tuesday, May 29, 2018 9:29 AM
> *To:* public-webrtc@w3.org
> *Subject:* Raw data APIs - 2 - Access to unencoded data
>
>
> *Proposal for Raw Data*
> Extend MediaStreamTrack.
> Let the following execute:
>
> track = new MediaStreamTrack();
> track.injectData(buffer).then((buffer) => recycleBuffer);
> track.extractData(buffer).then((buffer) => processBuffer);
>
> The buffers consumed and produced should be modeled after the C++ API
> for injecting frames (for video) or samples (for audio).
> For integration with other systems (in particular WebAssembly), it’s
> important that buffers be provided by the application - this allows
> copying to be minimized without risking security.
> It’s also important that the ownership of buffers is well defined - that
> we have clear demarcation points where buffers are owned by the
> MediaStreamTrack and its customers, and when they are owned by the
> application for refilling. A promise-based interface seems good for this
> type of operation.
> Alternatively, a Streams-based interface can be used, such as the one
> _proposed earlier_
> <https://github.com/yellowdoge/streams-mediastreamtrack>. This would
> still be defined in terms of the buffer construct below, but would use
> Streams, and therefore require a data copy.
>
> Partial interface MediaStreamTrack {
> promise<Buffer> insertData(buffer);  // will fail if track is connected
> to a source
>           promise<Buffer> extractData(buffer);  // will fail if track is
> connected to a sink
> }
>
> The buffers should be structures, not raw buffers. For instance:
>
> Interface Buffer {
>     DOMString kind;  // video, audio, encoded-video, encoded-audio
>     Long long timestamp; // for start of buffer. Definition TBD.
>     BufferFormat format;
>     ByteArray buffer;  // raw bytes, to be cast into appropriate format.
>                                  // need to study WebAsm linkages to get
> this processing efficient.
> }
>
> Interface AudioDataBuffer: Buffer {
>     // The name AudioBuffer is already used by WebAudio. This buffer has
> the same
>     // properties, but can be used with multiple audio data formats.
>     AudioBufferFormat format;
>     float? sampleRate; // only valid for audio
>     int? Channels;  // only valid for audio
> }
>
> Enum AudioBufferFormat {
>   “l16-audio”,  // 16 bit integer samples
>   “f32-audio”,  // 32-bit floating point samples
> }
>
> Interface VideoBuffer : Buffer {
>    VideoBufferFormat format;
>    Int width;
>    Int height;
>    DOMString rotation;
> }
>
> Enum VideoBufferFormat {  // alternate - separate enums for audio and video
>   “i420-video”,
>   “i444-video”,  “yuv-video”,
> }
>
> One important aspect of such an interface is what happens if congestion
> happens - if insertFrame() is called more frequently than the downstream
> can consume, or if extractData() is called on a cadence that is slower
> than once every frame produced at source.
> In both these cases, for the raw image API, I think it is reasonable to
> just drop frames. The consumer needs to be able to look at the
> timestamps of the produced frames and do the Right Thing - raw frames
> have no interdependencies, so dropping frames is OK.
>
>
>
> --
> Surveillance is pervasive. Go Dark.
>
> ---------------------------------------------------------------------
> Intel Finland Oy
> Registered Address: PL 281, 00181 Helsinki
> Business Identity Code: 0357606 - 4
> Domiciled in Helsinki
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>

---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Received on Friday, 1 June 2018 20:19:22 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:18:42 UTC