Re: [mediacapture-record] Feature request: Concat filter (#166) from guest271314 via GitHub on 2019-03-30 (public-webrtc-logs@w3.org from March 2019)

From: guest271314 via GitHub <sysbot+gh@w3.org>
Date: Sat, 30 Mar 2019 18:42:52 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-478276280-1553971371-sysbot+gh@w3.org>
@Pehrsons Before continue down the rabbit hole with this proposal/feature request will post links to the resources that have researched so far, so that these resources are not lost due to user error (have done that before (["Problem Exists Between Chair And Keyboard :p"](https://stackoverflow.com/questions/43245657/how-do-i-append-a-file-to-formdata#comment73562215_43245657)) "lost code" that used Web Animation API to create a "video" and Native Messaging to bypass Web Speech API to communicate directly with `espeak-ng`; until "retrieve" the code from the currently non-functional device) and future specification writers/implementers (both for browsers and in the wild) might find useful

- [FRONT-END Better web video with AV1 codec](https://evilmartians.com/chronicles/better-web-video-with-av1-codec) (simple explanation of containers and codecs)
- [Port of the AV1 Video codec to WebAssembly](https://github.com/GoogleChromeLabs/wasm-av1) (once in rabbit hole, venture down the fork in the road where the sign says wasm) 
- [How to extract raw VP9 stream from WebM file?](https://stackoverflow.com/a/48589207) ("VP9 does [not have](https://www.gpac-licensing.com/2016/07/12/vp9-av1-bitstream-format/) a pure raw bitstream format. The closest thing is the lightweight IVF format - 32 byte global header + 12 byte header per frame .")
- [IVF](https://wiki.multimedia.cx/index.php/IVF) (what is an IVF file?)   
- [AV1 Wasm decoder demo](http://alex-wasm.appspot.com/av1/index.html)
- https://github.com/GoogleChromeLabs/wasm-av1/issues/2#issuecomment-446056435 (sample IVF files)
- [AV1 Bitstream Analyzer](https://hackernoon.com/av1-bitstream-analyzer-d25f1c27072b) ("A single video frame can sometimes take more than an hour to encode and as part of our routine testing, we encode 30 clips, each containing 60 frames. The encoding process is massively parallel and runs on a large number of AWS instances, but even with all that hardware, it can take hours or even days to run a single test job.")
- [rav1e]( https://github.com/xiph/rav1e) ("The fastest and safest AV1 encoder.")
- [AV1 Decoder](https://www.chromestatus.com/feature/5729898442260480) AV1 decoder is apparently included in Firefox and Chromium source code though no AV1 encoder ("codec") is a supported type of `MediaRecorder`, unless composing the codec incorrectly `MediaRecorder.isTypeSupported('video/webm;codecs=av01.0.05M.08') // false`
- (Intermission 1. What happens when `MediaRecorder` is called multiple times in "parallel" (e.g., `.map()` without `async/await`) and the media recorder is _less than 2 seconds_? Consistent or unexpected results?)
- What really happens when  `CanvasRenderingContext2D.drawImage()` and `requestAnimationFrame()` are used? Chromium has implemented `queueMicrotask`, which is executed before `requestAnimationFrame()` though does that solve the issue of images actually loading? ([Add decode() functionality to image elements.](https://codereview.chromium.org/2769823002/); [Generating Images in JavaScript Without Using the Canvas API And putting them into a web notification](https://medium.com/the-guardian-mobile-innovation-lab/generating-images-in-javascript-without-using-the-canvas-api-77f3f4355fad))
- [jsmpeg: Why a JavaScript Video Decoder Actually Makes Sense by Dominic Szablewski](https://fronteers.nl/congres/2015/sessions/jsmpeg-by-dominic-szablewski)
- [MPEG1 VIDEO DECODER IN JAVASCRIPT](https://phoboslab.org/log/2013/05/mpeg1-video-decoder-in-javascript)
- [Instructions to do WebM live streaming via DASH](https://sites.google.com/a/webmproject.org/wiki/adaptive-streaming/instructions-to-do-webm-live-streaming-via-dash)
- [WebM VOD Baseline format](https://sites.google.com/a/webmproject.org/wiki/adaptive-streaming/webm-vod-baseline-format) (This is (probably) closer to what we are trying to achieve, writing the data to a "container" (of choice))
- [Codec and container switching in MSE Sample](https://googlechrome.github.io/samples/media/sourcebuffer-changetype.html) (Media Source Extensions update: `.changeType()`; this could be useful at Chromium; Firefox implementation of `"segments"` mode currently achieves expected result https://github.com/guest271314/MediaFragmentRecorder/commit/09a731789d3aa6b5e4bd8b11d2f2387d8e08e5b9;  https://github.com/w3c/media-source/issues/190; [How to use “segments” mode at SourceBuffer of MediaSource to render same result at Chomium, Chorme and Firefox?](https://stackoverflow.com/q/46379009))
- (Intermission 2. Since HTML `<video>` can decode various videos, is it possible to somehow extract the underlying code which decodes that `media_file.ext`? Do we have to use the `<video>` element, where the requirement is really to not necessarily play the media to get the underlying images (and audio) from point _a_ to point _b_ (if "cues" are present) and concatenate those to a single "container" (e.g., `.mkv` or `.webm`, etc. i.e., - [webm-wasm lets you create webm videos in JavaScript via WebAssembly.](https://github.com/GoogleChromeLabs/webm-wasm))? Enter [Rust](https://en.wikipedia.org/wiki/Rust_(programming_language)) (!) ("Rust is a multi-paradigm systems programming language[12] focused on safety, especially safe concurrency.[13][14] Rust is syntactically similar to C++,[15] but is designed to provide better memory safety while maintaining high performance.
Rust was originally designed by Graydon Hoare at Mozilla Research, with contributions from Dave Herman, Brendan Eich, and others.[16][17] The designers refined the language while writing the Servo layout engine[18] and the Rust compiler. The compiler is free and open-source software dual-licensed under the MIT License and Apache License 2.0.") where there is evidently an assortment of thriving projects) 
- [Factor Gecko specific code out of VideoFrameContainer Bug 1416663](https://bugzilla.mozilla.org/show_bug.cgi?id=1416663) ("There's a bunch of code that uses HTMLMediaElement in VideoFrameContainer. Most of this is layout specific. If we can factor that out, say back into HTMLMediaElement which knows all about how Gecko does layout, then it's easier to import VideoFrameContainer into the gecko-media Rust crate.")
- [Rust and WebAssembly](https://rustwasm.github.io/book/what-is-webassembly.html#what-is-webassembly) (This is a very instructive document re wasm <=> JavaScript: https://cdn.rawgit.com/WebAssembly/wabt/aae5a4b7/demo/wat2wasm/)
- [wasm_bindgen](https://docs.rs/wasm-bindgen/0.2.40/wasm_bindgen/)
- [Struct web_sys::HtmlMediaElement](https://rustwasm.github.io/wasm-bindgen/api/web_sys/struct.HtmlMediaElement.html)
- [Enum script::dom::htmlmediaelement::MediaElementMicrotask](https://rustwasm.github.io/wasm-bindgen/api/web_sys/struct.MediaDecodingConfiguration.html?search=video) (Have not tried Rust language, though appears that it is possible to use `HTMLMediaElement`, `HTMLVideoElement` and `MediaStreamTrack` without using HTML `<video>` element, possibly in a `Worker` and/or `Worklet` thread? TODO: Dive in to Rust)

Some code which attempts to emulate the MDN description "as fast as it can" of `startRendering()` method of `OfflineAudioContext()`, essentially "parallel" asynchronous procedures passed to `Promise.all()` (`data URL` representation of image is for expediency, without regard for "compression". When trying with 1 second slices the result has unexpected consequences; there is some source code for `MediaRecorder` at either Chromium or Firefox which referenced 2 seconds?). TODO: try farming this procedure out to `AudioWorklet` and/or `TaskWorklet` (though since "WebWorkers can be expensive (e.g: ~5MB per thread in Chrome)" [tasklets](https://github.com/GoogleChromeLabs/tasklets) is that abstraction really accomplishing anything? Eventually crashed the tab at plnkr when trying `taskWorklet` multiple times at same session (trying to get a reference to a `<video>` within `TaskWorkerGlobalScope`))

```
<!DOCTYPE html>
<html>

<head>
</head>

<body>
  <div>click</div>
  <script>
    (async() => {
      const url = "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4";
      const blob = await (await fetch(url)).blob();
      const blobURL = URL.createObjectURL(blob);
      const meta = document.createElement("video");
      const canvas = document.createElement("canvas");
      document.body.appendChild(canvas);
      const ctx = canvas.getContext("2d");
      let duration = await new Promise(resolve => {
        meta.addEventListener("loadedmetadata", e => {
          canvas.width = meta.videoWidth;
          canvas.height = meta.videoHeight;
          // TODO: address media with no duration: media recorded using `MediaRecorder`
          // `ts-ebml` handles this case, though what if we do not want to use any libraries?
          resolve(meta.duration);
        });
        meta.src = blobURL;
      });

      console.log(duration);
      document.querySelector("div")
        .addEventListener("click", async e => {
          let z = 0;
          const chunks = [...Array(Math.floor(duration / 2) + 1).keys()].map(n => ({from:z, to:(z+=2) > duration ? duration : z}));
          console.log(chunks);

          const data = await Promise.all(chunks.map(({from, to}) => new Promise(resolve => {
            const video = document.createElement("video");
            const canvas = document.createElement("canvas");
            const ctx = canvas.getContext("2d");
            const images = [];
            let raf, n = 0;
            const draw = _ => {
              console.log(`drawing image ${n++}`);
              if (video.paused) {
                cancelAnimationFrame(raf);
                return;
              }
              ctx.drawImage(video, 0, 0, video.videoWidth, video.videoHeight);
              images.push(canvas.toDataURL());
              raf = requestAnimationFrame(draw);
            }
            const recorder = new MediaRecorder(video.captureStream());
            recorder.addEventListener("dataavailable", e => {
              cancelAnimationFrame(raf);
              resolve({images, blob:e.data});
            });
            video.addEventListener("playing", e => {
              if (recorder.state !== "recording") {
                recorder.start();
              }
              raf = requestAnimationFrame(draw);
            }, {once:true});
            video.addEventListener("canplay", e => {
              canvas.width = video.videoWidth;
              canvas.height = video.videoHeight;
              video.play().catch(console.error);
            }, {once:true});

            video.addEventListener("pause", e => {
              recorder.stop();
              cancelAnimationFrame(raf)
            });
            const src = `${blobURL}#t=${from},${to}`;
            console.log(src);
            video.src = src;

          })));
          console.log(data);
          /*
          data.forEach(({blob, images}) => {
            console.log(images);
            const video = document.createElement("video");
            video.controls = true;
            document.body.appendChild(video);
            video.src = URL.createObjectURL(blob);
          });
          */
          // TODO: draw the images to a <canvas>, though see https://codereview.chromium.org/2769823002/
        })
    })();
  </script>
</body>

</html>

```

Upon running the above code, it occurred that we could create yet another (very simple) media "container" type, again, using only the browser: create N slices of separate audio and image "files" using `MediaRecorder`; e.g. something like, 

```
{
audio: {data: /* audio as an array, capable of serialization */, from: 0, to: 2},
video: {data: /* video as an array of images (uncompressed, we'll address that "later") */, from: 0, to: 2}
, title: /* title, as array or string */ 
}
```
where a "server" and/or `MediaRecorder` could select any "segments" of media, concatenate and encode as a `.ext` file and serve that "file"; or, simply serve the requested "segments" in the `JSON` form. The issue would then be how to stream the data using `ReadableStream`/`WritableStream`, though that can be overcome, to an appreciable degree by only serving the smallest "chunk" possible (2 seconds?). That is: use the browser itself to encode the files.

-- 
GitHub Notification of comment by guest271314
Please view or discuss this issue at https://github.com/w3c/mediacapture-record/issues/166#issuecomment-478276280 using your GitHub account
Received on Saturday, 30 March 2019 18:42:54 UTC