- From: Brad Hill <hillbrad@gmail.com>
- Date: Tue, 17 Jan 2017 20:43:14 +0000
- To: Sergey Shekyan <shekyan@gmail.com>, Jonathan Garbee <jonathan.garbee@gmail.com>
- Cc: Daniel Veditz <dveditz@mozilla.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>
- Message-ID: <CAEeYn8hahJU+DjHjwcpCtdCJ+tE66_DwGcM_hQQ7Tx1SsNLDag@mail.gmail.com>
I'm not sure what security benefit this would provide - that's the chartered scope of this WG, not performance or general traffic management. As we all seem to agree that genuinely malicious traffic would not self-identify as such, perhaps there is a better home for this discussion on WICG? https://www.w3.org/community/wicg/ On Tue, Jan 17, 2017 at 11:48 AM Sergey Shekyan <shekyan@gmail.com> wrote: > They should reply differently exactly in the same way they respond now. > For example, to recommend the use of an API for scraping rather that > loading heavy resources for every page, or immediately send through failed > CAPTCHA route, or not to show ads to a headless browser. > > The only difference would be that they would stop inferring many indirect > signals of automation if there is a flag for that already set by the UA. > Sure, if UA automation tools would have a built in way to honor > robots.txt, that might solve some of the problems, but they don't. > > All I am asking is to standardize that mechanism. > > On Mon, Jan 16, 2017 at 11:18 PM, Jonathan Garbee < > jonathan.garbee@gmail.com> wrote: > > I'm what way should they respond differently? The site has absolutely no > context as to why headless is being used. Why mangle the response without > any context and just hope your users still get benefit from it? > > On Mon, Jan 16, 2017, 4:47 PM Sergey Shekyan <shekyan@gmail.com> wrote: > > robots.txt is either is an on/off switch, while what I propose is more > granular, allowing websites to chose how to respond. > > > On Sat, Jan 14, 2017 at 5:52 AM, Jonathan Garbee < > jonathan.garbee@gmail.com> wrote: > > I don't see where having a header or something to help detect automated > access will be beneficial. We can already automate browser engines. > Headless mode is just a native way to do it. So, if someone is already not > taking your robots.txt into account, they'll just use another method or > strip whatever we add to say headless mode is in use out. Sites don't gain > any true benefit from having this kind of detection. If someone wants to > automate tasks they do regularly, that's their prerogative. We have > robots.txt as a respectful way to ask people automating things to avoid > certain areas and actions, that easily continues into headless mode. > > On Sat, Jan 14, 2017, 4:28 AM Sergey Shekyan <shekyan@gmail.com> wrote: > > I am talking about tools that automate user agents, e.g. headless browsers > (PhantomJS, SlimerJS, headless Chrome), Selenium, curl, etc. > I mentioned navigation requests as don't see so far how advertising > automation to non-navigation requests would help. > Another option to advertise can be a property on navigator object, which > would defer possible actions by authors to second request. > > > On Sat, Jan 14, 2017 at 12:56 AM, Daniel Veditz <dveditz@mozilla.com> > wrote: > > On Fri, Jan 13, 2017 at 5:11 PM, Sergey Shekyan <shekyan@gmail.com> wrote: > > I think that attaching a HTTP request header to synthetically initiated > navigation requests (https://fetch.spec.whatwg.org/#navigation-request) > will help authors to build more reliable mechanisms to detect unwanted > automation. > > > I don't see anything in that spec about "synthetic" navigation requests. > Where would you define that? How would you define that? Is a scripted > window.open() in a browser "synthetic"? what about an iframe in a page? > Does it matter if the user expected the iframe to be there or not (such as > ads)? What if the page had 100 iframes? > > Are you trying to solve the same problem robots.txt is trying to solve? If > not what kind of automation are you talking about? > > - > Dan Veditz > > > > >
Received on Tuesday, 17 January 2017 20:43:58 UTC