[w3c/editing] We should be precise about how show/hide relies on user activation (#326)

Porting over from MSEdgeExplainers repo https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/331


@mustaqahmed filed:

> While the proposal indirectly says the show/hide calls would be tied to user interactions, we have to be precise how we achieve this.  The HTML spec defines [three levels](https://html.spec.whatwg.org/multipage/interaction.html#user-activation-gated-apis) of user activation dependence, one of them should definitely be used for `show()` to prevent possible abusive behavior by rogue apps.  Perhaps "sticky" activation makes the most sense here?
>
> Additionally, should `hide()` should also be gated somehow?  I can't imagine any abusive behavior here.  Any comments?

from @BoCupp-Microsoft 

> Hi @mustaqahmed,
> 
> If we apply a user activation dependency then I agree it should be sticky as you mention.
> 
> RE: gating of `hide`, I don't think we need to gate it on user activation, but also, after @snianu and I had a lengthy conversation, we decided that there's really no compelling use case to call hide when the user hasn't interacted with the document, since, at least in Chromium, we dismiss the keyboard when navigating.   `hide` would only be observable if the user manually revealed the keyboard after navigation, and then the site (without user interaction) had an event like a timer that called `hide` and dismissed the keyboard again.  That seems like an unlikely pattern.
> 
> It is more interesting if we consider eliminating our hide on navigation behavior.  There's [a blink-dev response here from Ojan](https://groups.google.com/a/chromium.org/forum/?utm_medium=email&utm_source=footer#!msg/blink-dev/q80uCrMgiTM/T8FR4wMhBQAJ) asking to support not only hide, but also show without user activation.
> 
> @snianu and I discussed what's the worst that could happen if we eliminated the user activation requirement entirely.  What we came up with is that the active document could prevent input to itself if the only keyboard for the system was the onscreen keyboard.  That's something a site can more easily do (though maybe not more annoyingly do) today by just not providing a place to type - so it doesn't seem like there is a new threat.
> 
> There are [comments here](https://github.com/whatwg/html/issues/4876#issuecomment-526761472) by @rniwa and @whsieh about not being interested in allowing the keyboard to show without user interaction.  @rniwa or @whsieh, could you share your rationale?  Would you mind taking a look at the scenario I linked above from Ojan and let me know why we should or should not support such a scenario?
> 
> One point to keep in mind while considering that question: we would only allow the active document to make hide/show calls, so it wouldn't be possible for an annoying advertisement in an iframe to hide or show the keyboard unless the user or top browsing context caused the iframe to receive focus.
> 
> Thanks!
> Bo

from @snianu 

>@ojanvafai Could you please elaborate the use case you mentioned in [this comment](https://groups.google.com/a/chromium.org/d/msg/blink-dev/q80uCrMgiTM/cH812K_AAQAJ)
We are trying to figure out if user activation restriction makes sense for the VK APIs `show()/hide()`

from @ojanvafai

> For what it's worth, Bo's suggestion to restrict to the active document successfully addresses any reasonable potential user annoyance or privacy concerns I can think of. Seems like a reasonable compromise to me.
> 
> As to my use cases...
> 
> I have an email client that has a compose page. There's nothing to do except on that page except type a message, but I can't actually show the keyboard automatically even though that's clearly what a user would expect (e.g. try this in any email client on mobile). This is especially frustrating because I have the compose page added to my homescreen. The same would apply to a todo list site, a note-taking site, etc. The page for adding a new todo has extra friction compared to native apps if you have to click once to open the app and then again to get the keyboard before you can type.
> 
> It's a bit unfortunate, but for my specific use case, there's a compromise where we could consider clicking on an icon on the home screen to count as a user activation. But there are other cases that wouldn't work well for. A couple examples that come to mind:
> - It's increasingly common for parts of a page to be loaded asynchronously. Imagine if you are in Gmail and click to compose button, but Gmail needs to complete loading the code for the compose feature. Once the compose loads, Gmail can show an email, focus the appropriate text box, but can't bring up the keyboard.
> - In a multi-page web site, the same as the above applies. If you click compose and it loads a new page, the user has to click again after page load for the keyboard to come up.

from @snianu 

> Thank you @ojanvafai for the details! @rniwa @whsieh @othermaciej I think these use cases are compelling and can only be achieved by removing the user activation restriction for the VK `show/hide` APIs. Could you please comment on why you think user activation is a necessary requirement for showing the VK on mac? Why is active document mitigation not sufficient? Also, do you think this restriction makes sense for the new VK `show/hide` APIs? 

from @whsieh

> I’m not sure what "showing the virtual keyboard" on macOS entails, but on iOS and iPadOS, we require user activation before showing the virtual keyboard to prevent websites from creating a bad user experience by forcing the keyboard to appear out of any clear context. (Or, to put it another way — as a user, the only times I would expect the software keyboard to show up is after interacting with the page in some way).
>
> It’s worth noting that, in the case where the hardware keyboard is attached on iPad (and as a result, the virtual keyboard is replaced with the much less obtrusive input accessory view), we match macOS behavior and allow programmatic focus to summon the “keyboard” (i.e. input accessory view).

from @snianu 

> @whsieh Take this use case for example:
>> I have an email client that has a compose page. There's nothing to do except on that page except type a message, but I can't actually show the keyboard automatically even though that's clearly what a user would expect (e.g. try this in any email client on mobile).
>
>User has already intended to compose an email so how does an additional tap/click on the editable field make any difference? 
Do you think it makes sense to at least remove this restriction for the new VK API proposal? that is, if the site is using the new VK policy then we can show the VK without user interaction requirement?

from @mustaqahmed 

> We have to be careful about an opposite (abusive) scenario where the top frame may or may not have a textbox but a malicious ad sub-frame has a hidden textbox only to steal focus.  In Chrome we prevent this by disallowing autofocus or onload/programmatic focus from sandboxed sub-frames, but I don't think all major browsers have this intervention today.
> 
> 1. To support the use-cases @ojanvafai mentioned, I think all we need is to drop user activation restrictions _only_ from the top frames.
> 
> 2. Additionally, for cross-origin sub-frames we can be stricter than what @BoCupp-Microsoft mentioned above: such sub-frames would need transient activation instead of just sticky activation.
> 
> @snianu Does 1+2 sound reasonable to you?

from @snianu 

> @mustaqahmed @BoCupp-Microsoft and I discussed about this and we have some questions about 2. We agree that 1 should address @ojanvafai 's use cases, but for 2 if the browsers support the [sandboxing flag set](https://html.spec.whatwg.org/multipage/origin.html#sandboxing), then it might be problematic for web authors if they want to control VK for cross-origin sub frames. If browsers do support [sandboxing flag](https://html.spec.whatwg.org/multipage/origin.html#sandboxing) then it might be OK to give authors control over the VK without requiring transient activation. WDYT? Any specific concerns you want to highlight?

from @othermaciej 

> I think it's still correct to require user activation, and we'd likely do so even for this new API on iOS/iPadOS. The limitation on keyboard via programmatic focus is partly due to legacy pages autofocusing without thinking through impact for onscreen keyboards, but this would not apply to a brand new API. However, on iOS Safari, given the browser chrome design, popping up the keyboard covers the bottom control strip that includes the back button. It would be particularly bad if a page could bring up the keyboard whenever it's dismissed to frustrate the user's desire to leave. iPadOS does not have this issue so perhaps it could be more permissive. But it still seems important for the keyboard to feel like it's under the user's control.
> 
> @ojanvafai's use case sounds theoretically compelling. However, I tested Gmail and Yahoo Mail, and both bring up the keyboard just fine when composing a new email in iOS Safari under current policy. It's true that directly navigating to the URL for new mail compose does not bring up the keyboard, but (a) this seems like a narrow edge case and (b) it's obvious and that a tap in a relevant area will start typing, and it's easy to perform that one extra tap in this unusual case.

from @ojanvafai

> @othermaciej curbing potential abuse seems reasonable to me. Have you explored less limiting mitigations? 
> 
> Incidentally, I have a Send Email iOS Shortcut that opens Mail to the compose view that automatically brings up the keyboard. That's the equivalent of my email client. It only has single digit number of users, so it doesn't matter from a web compat perspective, but it is a real example and one that makes less confident in investing into building web content for real business needs.
> 
> **An alternate mitigation:** Don't allow for multiple programmatic keyboard opens between user activations, i.e.:
> - Programmatically opening the keyboard sets a bit.
> - If the bit is set, then programmatic keyboard opening doesn't work.
> - The bit is cleared on the next user activation.
> 
> This has the advantage of allowing the keyboard to be programmatically opened exactly once on page load since the bit isn't set initially (enabling my use case), but it also allows async use cases like with React Suspense that I mentioned above.
> 
> **A probably less good mitigation:** Put a time limit on programmatically reopening the keyboard after it's been closed? 5-10 seconds could get you far enough that the attack wouldn't be worth doing anymore, so no one would do it. This feels a bit complicated and arbitrary maybe, but it would be straightforward to design and implement I think.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/editing/issues/326

Received on Friday, 6 August 2021 19:28:22 UTC