Re: [whatwg] Security restriction allows content thievery from Tab Atkins Jr. on 2012-07-15 (public-whatwg-archive@w3.org from July 2012)

From: Tab Atkins Jr. <jackalmage@gmail.com>
Date: Sun, 15 Jul 2012 15:33:58 -0700
To: Robert Eisele <robert@xarg.org>
Cc: whatwg@whatwg.org
Message-ID: <CAAWBYDCQgVcFawb62NXy4=t=UnY4q3U7hJXdT7kMHmf3QG0-bA@mail.gmail.com>

On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele <robert@xarg.org> wrote:
> Browsers are very restrictive when one tries to access the contents of
> different domains (including the scheme), embedded via framesets. This is
> normally a good practice, but I'd suggest to weaken this restriction for
> the data: URI schema.
>
> I'm currently building an analysis system like Google Analytics, which gets
> embedded into a website via a small JavaScript snippet. When I analyzed the
> data, I came across a very interesting trick because I got a lot of
> requests (with the data from location.href) where the entire website was
> embedded into a data:text/html URI - except that all ads of the page were
> replaced. Fortunately, my tracking code has been left without
> modifications.
>
> But the scary thing is that this way you can monetize foreign content by
> simply embedding it somewhere you can direct traffic to. That's pretty
> clever, because the original site owner doesn't notice this abuse due to
> the fact that top.location.href isn't readable. Or even worse, he would
> never notice it at all when he doesn't sniff the URI with JavaScript,
> because image files would have no referrer.
>
> My final approach to convict the abuser is based on the fact, that the
> JavaScript was dynamically loaded from my server and that I can write to
> location.href. So I added this piece of code:
>
> if (top.location.protocol === 'data:') {
>     top.location.href = 'http://example.com/trap/';
> }
>
> But even then the referrer will not be passed to the server. So my proposal
> is that the data URI schema gets an exception on this security behavior.

The problem you outline is not directly tied to the solution you
present.  You can scrape a site and display it as your own without any
fancy tricks, just by downloading all the resources and hosting them
yourself.  This merely consumes a little more bandwidth for the
attacker, since they're hosting the images/etc themselves.

The correct solution to this kind of problem is legal - this is simple
copyright violation.

I'm not sure about the merits of your suggestion otherwise.  It's
reasonable to make data: pages same-origin with their parent when
they're contained within something, but it seems dodgy to make them
same-origin with their *contained* pages as well.  If not done
carefully, that could allow contained pages access to the data: page's
parent as well, or other cross-origin pages that the data: page is
containing.

~TJ

Received on Sunday, 15 July 2012 22:34:46 UTC