W3C home > Mailing lists > Public > public-webrtc@w3.org > April 2022

RE: Enhancing Screen Capture for WebRTC

From: Adam Sobieski <adamsobieski@hotmail.com>
Date: Thu, 28 Apr 2022 21:35:16 +0000
To: Elad Alon <eladalon@google.com>
CC: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <CH0PR18MB42413E1093137738B6FB7B08C5FD9@CH0PR18MB4241.namprd18.prod.outlook.com>
WebRTC Working Group,

Thank you for raising the important points about security and keystrokes with respect to password input boxes. As envisioned, software applications would opt into supporting and interoperating with enhanced screen capturing and should cease forwarding keyboard-related events to tracks when password input boxes obtain focus and resume forwarding such events to such tracks when password input boxes lose focus.

We can also consider, more generally, forms. In theory, end-users could receive warnings when they are presented with or are completing forms with personally identifiable information (PII) while recording screencasts. For Web browsers, the autocomplete attribute [1] could be useful for detecting such occurrences.

I’m excited about enabling next-generation educational software, e.g., intelligent tutoring systems, for training end-users to use software applications (e.g., CAD, 3D modeling, image editing, office, and IDE software). Also interesting to me is the possibility that bulk recordings of the usage of such software could be training data for other AI systems, e.g., virtual assistants.

In addition to tracks for user-input events, there could be tracks for menu-related events, tracks for application commands, and tracks for other, application-specific kinds of events.

As interesting, a case study is Autodesk Screencast [2].

Best regards,

[1] https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/autocomplete

[2] https://knowledge.autodesk.com/community/screencast

From: Elad Alon <eladalon@google.com>
Sent: Thursday, April 28, 2022 4:32 AM
To: Adam Sobieski <adamsobieski@hotmail.com>
Cc: public-webrtc@w3.org
Subject: Re: Enhancing Screen Capture for WebRTC

Hi Adam,

IIUC, you're proposing a stream of events derived from keystrokes be exposed to capturing applications. Am I right? You have probably considered the inherent risks, such as how this could effectively be a keylogger, and users could be tricked into exposing passwords. (Users might also expose private information by oversight while using non-malicious applications.) I don't think a checkbox - as you suggest - would be sufficient protection against such risks. If you have thought of more robust protection for the user, please let me know.

Some additional thoughts:
* For tab-capture of a tightly-coupled application, it is possible for the captured tab to report its various events to the capturing tab, which can in turnannotate the video with this information, record it separately in a format of its choosing, etc.
* For window- and monitor-capture, I think this is a power-feature best reserved for native apps - assuming the relevant OS even allows native apps to monitor arbitrary mouse/keyboard events.


On Fri, Apr 15, 2022 at 5:02 AM Adam Sobieski <adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.com>> wrote:
WebRTC Working Group,

Hello. I would like to describe and to express interest in some new features for WebRTC and its screen capture capabilities.

WebRTC screen capture – or a mode thereof – could include user-input and application events, e.g., mouse, keyboard, touchscreen, and stylus events, applications’ menu-related events and other application events. Beyond streaming video of dynamic screen content, accompanying data tracks could include events relevant to screen-captured software applications.

As envisioned, end-users would be able to opt into such an enhanced screen-capture mode during the initialization and configuration of screen-capturing, e.g., by selecting a checkbox with accompanying text asking the end-user whether they desire to additionally stream user-input and application events.

I present a use case below. I would like to express that that the desired features would enable a larger set of use cases than the one indicated.

A use case is that of intelligent tutoring systems which can teach end-users how to better utilize software applications, e.g., office software or CAD software. End-users could connect to intelligent tutoring systems via WebRTC and perform exercises while interacting with the tutoring systems, receiving assessment, instruction, and task-relevant hints.

Without the features under consideration, server-side computer-vision-based processing would be required to obtain the visible application-specific events from video streams, e.g., end-users opening menus and making use of application functionalities.

In (Grossman, Matejka, & Fitzmaurice, 2010), the authors state that “storing a document's workflow history, and providing tools for its visualization and exploration, could make any document a powerful learning tool.”

In (Bao, Li, Xing, Wang, & Zhou, 2015), the authors present a computer-vision-based video-scraping technique to “automatically reverse-engineer time-series interaction data from screen-captured videos.”

In (Frisson, Malacria, Bailly, & Dutoit, 2016), the authors describe a general-purpose tool for observing application usage and analyzing users’ behaviors, combining computer-vision-based analyses of video-recordings with the collection of low-level interactions.

In (Sadeghi, Dargon, Rivest, & Pernot, 2016), the authors present a framework for fully capturing processes of computer-aided design and engineering.

Thank you. I hope that these features for enhancing WebRTC and its screen-capturing capabilities are also of some interest to you.

Best regards,
Adam Sobieski


Grossman, Tovi, Justin Matejka, and George Fitzmaurice. "Chronicle: capture, exploration, and playback of document workflow histories." In Proceedings of the 23nd annual ACM symposium on User interface software and technology, pp. 143-152. 2010.

Bao, Lingfeng, Jing Li, Zhenchang Xing, Xinyu Wang, and Bo Zhou. "Reverse engineering time-series interaction data from screen-captured videos." In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 399-408. IEEE, 2015.

Frisson, Christian, Sylvain Malacria, Gilles Bailly, and Thierry Dutoit. "Inspectorwidget: A system to analyze users behaviors in their applications." In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1548-1554. 2016.

Sadeghi, Samira, Thomas Dargon, Louis Rivest, and Jean-Philippe Pernot. "Capturing and analysing how designers use CAD software." (2016).

Received on Thursday, 28 April 2022 21:35:31 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 28 April 2022 21:35:32 UTC