Enhancing Screen Capture for WebRTC

WebRTC Working Group,

Hello. I would like to describe and to express interest in some new features for WebRTC and its screen capture capabilities.

WebRTC screen capture - or a mode thereof - could include user-input and application events, e.g., mouse, keyboard, touchscreen, and stylus events, applications' menu-related events and other application events. Beyond streaming video of dynamic screen content, accompanying data tracks could include events relevant to screen-captured software applications.

As envisioned, end-users would be able to opt into such an enhanced screen-capture mode during the initialization and configuration of screen-capturing, e.g., by selecting a checkbox with accompanying text asking the end-user whether they desire to additionally stream user-input and application events.

I present a use case below. I would like to express that that the desired features would enable a larger set of use cases than the one indicated.

A use case is that of intelligent tutoring systems which can teach end-users how to better utilize software applications, e.g., office software or CAD software. End-users could connect to intelligent tutoring systems via WebRTC and perform exercises while interacting with the tutoring systems, receiving assessment, instruction, and task-relevant hints.

Without the features under consideration, server-side computer-vision-based processing would be required to obtain the visible application-specific events from video streams, e.g., end-users opening menus and making use of application functionalities.

In (Grossman, Matejka, & Fitzmaurice, 2010), the authors state that "storing a document's workflow history, and providing tools for its visualization and exploration, could make any document a powerful learning tool."

In (Bao, Li, Xing, Wang, & Zhou, 2015), the authors present a computer-vision-based video-scraping technique to "automatically reverse-engineer time-series interaction data from screen-captured videos."

In (Frisson, Malacria, Bailly, & Dutoit, 2016), the authors describe a general-purpose tool for observing application usage and analyzing users' behaviors, combining computer-vision-based analyses of video-recordings with the collection of low-level interactions.

In (Sadeghi, Dargon, Rivest, & Pernot, 2016), the authors present a framework for fully capturing processes of computer-aided design and engineering.

Thank you. I hope that these features for enhancing WebRTC and its screen-capturing capabilities are also of some interest to you.


Best regards,
Adam Sobieski

REFERENCES

Grossman, Tovi, Justin Matejka, and George Fitzmaurice. "Chronicle: capture, exploration, and playback of document workflow histories." In Proceedings of the 23nd annual ACM symposium on User interface software and technology, pp. 143-152. 2010.

Bao, Lingfeng, Jing Li, Zhenchang Xing, Xinyu Wang, and Bo Zhou. "Reverse engineering time-series interaction data from screen-captured videos." In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 399-408. IEEE, 2015.

Frisson, Christian, Sylvain Malacria, Gilles Bailly, and Thierry Dutoit. "Inspectorwidget: A system to analyze users behaviors in their applications." In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1548-1554. 2016.

Sadeghi, Samira, Thomas Dargon, Louis Rivest, and Jean-Philippe Pernot. "Capturing and analysing how designers use CAD software." (2016).

Received on Friday, 15 April 2022 03:01:06 UTC