- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Sun, 15 Jul 2012 15:33:58 -0700
- To: Robert Eisele <robert@xarg.org>
- Cc: whatwg@whatwg.org
On Sun, Jul 15, 2012 at 3:22 PM, Robert Eisele <robert@xarg.org> wrote: > Browsers are very restrictive when one tries to access the contents of > different domains (including the scheme), embedded via framesets. This is > normally a good practice, but I'd suggest to weaken this restriction for > the data: URI schema. > > I'm currently building an analysis system like Google Analytics, which gets > embedded into a website via a small JavaScript snippet. When I analyzed the > data, I came across a very interesting trick because I got a lot of > requests (with the data from location.href) where the entire website was > embedded into a data:text/html URI - except that all ads of the page were > replaced. Fortunately, my tracking code has been left without > modifications. > > But the scary thing is that this way you can monetize foreign content by > simply embedding it somewhere you can direct traffic to. That's pretty > clever, because the original site owner doesn't notice this abuse due to > the fact that top.location.href isn't readable. Or even worse, he would > never notice it at all when he doesn't sniff the URI with JavaScript, > because image files would have no referrer. > > My final approach to convict the abuser is based on the fact, that the > JavaScript was dynamically loaded from my server and that I can write to > location.href. So I added this piece of code: > > if (top.location.protocol === 'data:') { > top.location.href = 'http://example.com/trap/'; > } > > But even then the referrer will not be passed to the server. So my proposal > is that the data URI schema gets an exception on this security behavior. The problem you outline is not directly tied to the solution you present. You can scrape a site and display it as your own without any fancy tricks, just by downloading all the resources and hosting them yourself. This merely consumes a little more bandwidth for the attacker, since they're hosting the images/etc themselves. The correct solution to this kind of problem is legal - this is simple copyright violation. I'm not sure about the merits of your suggestion otherwise. It's reasonable to make data: pages same-origin with their parent when they're contained within something, but it seems dodgy to make them same-origin with their *contained* pages as well. If not done carefully, that could allow contained pages access to the data: page's parent as well, or other cross-origin pages that the data: page is containing. ~TJ
Received on Sunday, 15 July 2012 22:34:46 UTC