W3C home > Mailing lists > Public > public-webappsec@w3.org > January 2017

Re: Proposal to advertise automation of UA

From: Brad Hill <hillbrad@gmail.com>
Date: Tue, 17 Jan 2017 20:43:14 +0000
Message-ID: <CAEeYn8hahJU+DjHjwcpCtdCJ+tE66_DwGcM_hQQ7Tx1SsNLDag@mail.gmail.com>
To: Sergey Shekyan <shekyan@gmail.com>, Jonathan Garbee <jonathan.garbee@gmail.com>
Cc: Daniel Veditz <dveditz@mozilla.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>
I'm not sure what security benefit this would provide - that's the
chartered scope of this WG, not performance or general traffic management.
As we all seem to agree that genuinely malicious traffic would not
self-identify as such, perhaps there is a better home for this discussion
on WICG?


On Tue, Jan 17, 2017 at 11:48 AM Sergey Shekyan <shekyan@gmail.com> wrote:

> They should reply differently exactly in the same way they respond now.
> For example, to recommend the use of an API for scraping rather that
> loading heavy resources for every page, or immediately send through failed
> CAPTCHA route, or not to show ads to a headless browser.
> The only difference would be that they would stop inferring many indirect
> signals of automation if there is a flag for that already set by the UA.
> Sure, if UA automation tools would have a built in way to honor
> robots.txt, that might solve some of the problems, but they don't.
> All I am asking is to standardize that mechanism.
> On Mon, Jan 16, 2017 at 11:18 PM, Jonathan Garbee <
> jonathan.garbee@gmail.com> wrote:
> I'm what way should they respond differently? The site has absolutely no
> context as to why headless is being used. Why mangle the response without
> any context and just hope your users still get benefit from it?
> On Mon, Jan 16, 2017, 4:47 PM Sergey Shekyan <shekyan@gmail.com> wrote:
> robots.txt is either is an on/off switch, while what I propose is more
> granular, allowing websites to chose how to respond.
> On Sat, Jan 14, 2017 at 5:52 AM, Jonathan Garbee <
> jonathan.garbee@gmail.com> wrote:
> I don't see where having a header or something to help detect automated
> access will be beneficial. We can already automate browser engines.
> Headless mode is just a native way to do it. So, if someone is already not
> taking your robots.txt into account, they'll just use another method or
> strip whatever we add to say headless mode is in use out. Sites don't gain
> any true benefit from having this kind of detection. If someone wants to
> automate tasks they do regularly, that's their prerogative. We have
> robots.txt as a respectful way to ask people automating things to avoid
> certain areas and actions, that easily continues into headless mode.
> On Sat, Jan 14, 2017, 4:28 AM Sergey Shekyan <shekyan@gmail.com> wrote:
> I am talking about tools that automate user agents, e.g. headless browsers
> (PhantomJS, SlimerJS, headless Chrome), Selenium, curl, etc.
> I mentioned navigation requests as don't see so far how advertising
> automation to non-navigation requests would help.
> Another option to advertise can be a property on navigator object, which
> would defer possible actions by authors to second request.
> On Sat, Jan 14, 2017 at 12:56 AM, Daniel Veditz <dveditz@mozilla.com>
> wrote:
> On Fri, Jan 13, 2017 at 5:11 PM, Sergey Shekyan <shekyan@gmail.com> wrote:
> I think that attaching a HTTP request header to synthetically initiated
> navigation requests (https://fetch.spec.whatwg.org/#navigation-request)
> will help authors to build more reliable mechanisms to detect unwanted
> automation.
> ​I don't see anything in that spec about "synthetic" navigation requests.
> Where would you define that? How would you define that? Is a scripted
> window.open() in a browser "synthetic"? what about an iframe in a page?
> Does it matter if the user expected the iframe to be there or not (such as
> ads)? What if the page had 100 iframes?
> Are you trying to solve the same problem robots.txt is trying to solve? If
> not what kind of automation are you talking about?​
> -
> ​Dan Veditz​
Received on Tuesday, 17 January 2017 20:43:58 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 18:54:59 UTC