Re: Enhancing Screen Capture for WebRTC from Elad Alon on 2022-04-29 (public-webrtc@w3.org from April 2022)

From: Elad Alon <eladalon@google.com>
Date: Fri, 29 Apr 2022 09:56:07 +0200
To: Adam Sobieski <adamsobieski@hotmail.com>, Jan-Ivar Bruaroey <jib@mozilla.com>
Cc: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <CAMO6jDNk108_FnVov2dh7-KuDkWUR1EPWvx2K4M_yYi7bUscLA@mail.gmail.com>
Hi Adam,

Your original message appeared to me to focus on *whole-screen capture*. I
am not aware to what extent various operating systems allow that nowadays,
but I can't imagine any OS provides an API that would allow you to listen
in on all keyboard/mouse events, but somehow not get those that a native
application like a browser considers, internally, to be a "password input
box". (If I'm wrong - let me know.)

For anything other than whole-screen capture, you could *ask* the captured
application to forward you the events - and I think that's much safer and
much more feasible.

   - For *tab-sharing*, I suggest that you instead try to establish which
   tab you were capturing (using Capture Handle Identity
   <https://chromestatus.com/feature/4854125411958784>). If both
   applications know each other and can establish trust, you can ask the
   captured application to forward you a stream of keyboard/mouse-events using
   any other mechanism you can think of.
   - For *window-sharing*, I suggest you lobby for Capture Handle Identity
   being extended to handle native windows. @Jan-Ivar Bruaroey
   <jib@mozilla.com> has wanted[*] to disallow it here
   <https://github.com/w3c/mediacapture-handle/issues/10#issuecomment-1043006580>,
   and I have responded that I know Web developers who wish to expand it to
   windows here
   <https://github.com/w3c/mediacapture-handle/issues/10#issuecomment-1043186484>.
   You're welcome to join the discussion, come up with a proposal, etc.


Thanks,
Elad

---
[*] For the record, IIRC Jan-Ivar's objection was on the grounds that we
should start small and iterate. The usual caveats apply about one person
trying to represent another's position.

On Thu, Apr 28, 2022 at 11:35 PM Adam Sobieski <adamsobieski@hotmail.com>
wrote:

> WebRTC Working Group,
>
> Elad,
>
>
>
> Thank you for raising the important points about security and keystrokes
> with respect to password input boxes. As envisioned, software applications
> would opt into supporting and interoperating with enhanced screen capturing
> and should cease forwarding keyboard-related events to tracks when password
> input boxes obtain focus and resume forwarding such events to such tracks
> when password input boxes lose focus.
>
>
>
> We can also consider, more generally, forms. In theory, end-users could
> receive warnings when they are presented with or are completing forms with
> personally identifiable information (PII) while recording screencasts. For
> Web browsers, the autocomplete attribute [1] could be useful for detecting
> such occurrences.
>
>
>
> I’m excited about enabling next-generation educational software, e.g.,
> intelligent tutoring systems, for training end-users to use software
> applications (e.g., CAD, 3D modeling, image editing, office, and IDE
> software). Also interesting to me is the possibility that bulk recordings
> of the usage of such software could be training data for other AI systems,
> e.g., virtual assistants.
>
>
>
> In addition to tracks for user-input events, there could be tracks for
> menu-related events, tracks for application commands, and tracks for other,
> application-specific kinds of events.
>
>
>
> As interesting, a case study is *Autodesk Screencast* [2].
>
>
>
>
>
> Best regards,
>
> Adam
>
>
>
> [1]
> https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/autocomplete
>
> [2] https://knowledge.autodesk.com/community/screencast
>
>
>
> *From:* Elad Alon <eladalon@google.com>
> *Sent:* Thursday, April 28, 2022 4:32 AM
> *To:* Adam Sobieski <adamsobieski@hotmail.com>
> *Cc:* public-webrtc@w3.org
> *Subject:* Re: Enhancing Screen Capture for WebRTC
>
>
>
> Hi Adam,
>
>
>
> IIUC, you're proposing a stream of events derived from keystrokes be
> exposed to capturing applications. Am I right? You have probably considered
> the inherent risks, such as how this could effectively be a keylogger, and
> users could be tricked into exposing passwords. (Users might also expose
> private information by oversight while using non-malicious applications.) I
> don't think a checkbox - as you suggest - would be sufficient protection
> against such risks. If you have thought of more robust protection for the
> user, please let me know.
>
>
>
> Some additional thoughts:
>
> * For tab-capture of a tightly-coupled application, it is possible for the
> captured tab to report its various events to the capturing tab, which can
> in turnannotate the video with this information, record it separately in a
> format of its choosing, etc.
>
> * For window- and monitor-capture, I think this is a power-feature best
> reserved for native apps - assuming the relevant OS even allows native apps
> to monitor arbitrary mouse/keyboard events.
>
>
>
> Thanks,
>
> Elad
>
>
>
> On Fri, Apr 15, 2022 at 5:02 AM Adam Sobieski <adamsobieski@hotmail.com>
> wrote:
>
> WebRTC Working Group,
>
>
>
> Hello. I would like to describe and to express interest in some new
> features for WebRTC and its screen capture capabilities.
>
>
>
> WebRTC screen capture – or a mode thereof – could include user-input and
> application events, e.g., mouse, keyboard, touchscreen, and stylus events,
> applications’ menu-related events and other application events. Beyond
> streaming video of dynamic screen content, accompanying data tracks could
> include events relevant to screen-captured software applications.
>
>
>
> As envisioned, end-users would be able to opt into such an enhanced
> screen-capture mode during the initialization and configuration of
> screen-capturing, e.g., by selecting a checkbox with accompanying text
> asking the end-user whether they desire to additionally stream user-input
> and application events.
>
>
>
> I present a use case below. I would like to express that that the desired
> features would enable a larger set of use cases than the one indicated.
>
>
>
> A use case is that of intelligent tutoring systems which can teach
> end-users how to better utilize software applications, e.g., office
> software or CAD software. End-users could connect to intelligent tutoring
> systems via WebRTC and perform exercises while interacting with the
> tutoring systems, receiving assessment, instruction, and task-relevant
> hints.
>
>
>
> Without the features under consideration, server-side
> computer-vision-based processing would be required to obtain the visible
> application-specific events from video streams, e.g., end-users opening
> menus and making use of application functionalities.
>
>
>
> In (Grossman, Matejka, & Fitzmaurice, 2010), the authors state that
> “storing a document's workflow history, and providing tools for its
> visualization and exploration, could make any document a powerful learning
> tool.”
>
>
>
> In (Bao, Li, Xing, Wang, & Zhou, 2015), the authors present a
> computer-vision-based video-scraping technique to “automatically
> reverse-engineer time-series interaction data from screen-captured videos.”
>
>
>
> In (Frisson, Malacria, Bailly, & Dutoit, 2016), the authors describe a
> general-purpose tool for observing application usage and analyzing users’
> behaviors, combining computer-vision-based analyses of video-recordings
> with the collection of low-level interactions.
>
>
>
> In (Sadeghi, Dargon, Rivest, & Pernot, 2016), the authors present a
> framework for fully capturing processes of computer-aided design and
> engineering.
>
>
>
> Thank you. I hope that these features for enhancing WebRTC and its
> screen-capturing capabilities are also of some interest to you.
>
>
>
>
>
> Best regards,
>
> Adam Sobieski
>
>
>
> *REFERENCES*
>
>
>
> Grossman, Tovi, Justin Matejka, and George Fitzmaurice. "Chronicle:
> capture, exploration, and playback of document workflow histories." In
> Proceedings of the 23nd annual ACM symposium on User interface software and
> technology, pp. 143-152. 2010.
>
>
>
> Bao, Lingfeng, Jing Li, Zhenchang Xing, Xinyu Wang, and Bo Zhou. "Reverse
> engineering time-series interaction data from screen-captured videos." In
> 2015 IEEE 22nd International Conference on Software Analysis, Evolution,
> and Reengineering (SANER), pp. 399-408. IEEE, 2015.
>
>
>
> Frisson, Christian, Sylvain Malacria, Gilles Bailly, and Thierry Dutoit.
> "Inspectorwidget: A system to analyze users behaviors in their
> applications." In Proceedings of the 2016 CHI Conference Extended Abstracts
> on Human Factors in Computing Systems, pp. 1548-1554. 2016.
>
>
>
> Sadeghi, Samira, Thomas Dargon, Louis Rivest, and Jean-Philippe Pernot.
> "Capturing and analysing how designers use CAD software." (2016).
>
>
>
>
Received on Friday, 29 April 2022 07:56:33 UTC