- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 2 Nov 2007 22:05:54 +0000 (UTC)
- To: HTML WG List <public-html@w3.org>, WHAT WG List <whatwg@whatwg.org>
- Message-ID: <Pine.LNX.4.62.0711021803270.27205@hixie.dreamhostps.com>
This e-mail contains replies to a number of e-mails received on the topic of the proposed ping="" attribute since January 2006. Since e-mail on this topic was sent to both the WHATWG and HTMLWG mailing lists, I have cc'ed both on this e-mail. Please pick just one when replying. Thanks! On Thu, 19 Jan 2006, Tyler Close wrote: > On 1/19/06, Jim Ley <jim.ley@gmail.com> wrote: > > On 1/19/06, Tyler Close <tyler.close@gmail.com> wrote: > > > > > > I think it would be fair to characterize current techniques for link > > > click tracking as "opaque". In contrast, the proposed "ping" > > > attribute explicitly declares in the HTML what is intended and how > > > it will happen. Perhaps the right way to explain the "ping" > > > attribute is as providing transparent, or explicit, feedback; > > > shining a light on the dark corners of click tracking. If it is > > > explained that the feature will make link click tracking explicit, > > > controllable and more usable, I think the user base will react more > > > positively. > > > > No, they'll just disable it, as it does them directly no benefit and > > has a cost, so if you educate them enough to make a decision, they > > will not decide to be tracked. > > Why hasn't this happened to the HTTP Referer header? Well, to a very small extent, it has. And indeed I would expect the same level of response to the ping="" attribute as the HTTP Referer header received. This is actually a good thing -- we want people who want their privacy protected to this degree to be able to do so. Right now, they can't, because the existing tracking methods are roadblocks to the content they are tracking, and thus can't be bypassed. The ping="" attribute is intended to help users. Right now, user tracking happens widely, but suffers from a number of problems, including being: * non-transparent (users can't see what's being pinged easily) * non-optional (users can't disable it) * slow (adding DNS and TCP roundtrip to every tracked request) * obfuscated (the final target is usually hard to determine) ping="" solves all of these problems. It also helps the authoring side by making the tracking cleaner and making the user experience better, which should be enough to get authors to switch (we've already had a number of authors say they would use it as soon as it was widely available, some even said they'd use it earlier on a per-browser basis!). > > Since the main use of tracking has a direct economic cost to many > > parties the sites will then return to using the established successful > > methods for tracking, no-one will gain and browsers would've wasted > > lots of time that could've been spent on more productive features. > > I think an economic analysis of the scenario is a valid approach. Could > you spell out your argument in more detail? For example, after I've > submitted a search request to Google, what is the economic cost to me of > letting Google know which result I selected? What is the economic > benefit to me of providing this information to Google? > > I can see an argument that there is a net benefit to me to provide this > information. I don't see a clear argument that there is a net cost to > me. At the start of the exchange, the thing of value that I have are my > search terms. Once I've given those up, Google already has most of what > it needs to effectively advertise to me. Allowing Google to know which > result was most relevant to me might mean I get more value in the future > for revealing my search terms, in terms of better query results. > > I'm interested to hear your economic analysis. Jim was presumably referring to the ability to use ping="" for tracking advertisments, and was suggesting that companies might be so greedy that they would refuse to use ping="" for fear of not tracking users that had disabled the feature. Continuing with your theme above of a Google query, the theory might be that Google would value profit over user preferences so much that they would not use ping="" for tracking conversions. I could assure you that this is not the case, I'm sure many people wouldn't be convinced. However, there is already a way to bypass the conversion tracking for ads: simply copy the URL given in the ad and paste it into your location bar. If greed was really beating the user's experience, I would presume that this would not be made available. On Fri, 20 Jan 2006, Thomas Much wrote: > > There are browsers out there that let the user disable the HTTP referrer > (and enable it only for certain sites that require it for whatever > reasons). And our users definitely use this feature. That's why we need to provide ping="". There's no way to do this at the moment. Users aren't able to disable tracking right now. They should be able to. On Thu, 19 Jan 2006, James Graham wrote: > > Indeed. I believe that even browsers significantly more popular than > iCab allow for this. Yet the vast majority of people leave the feature > on. Indeed. That's why we should not be worried about authors not using the feature just because it can be disabled. On Fri, 20 Jan 2006, Thomas Much wrote: > > Maybe because they don't know about referrer security problems and even > if they do they don't know how to turn it off? (How many users know > about:config?) I know about both, and I don't disable referrer. > People felt safe using Firefox, and now someone tells them there's > happening something behind the scenes, "someone's tracking you." Look > at the comments on MozillaZine and on > <http://www.heise.de/newsticker/meldung/68508>. Even if the ping > attribute is a clean, open standard for something that has happend in > the dark so far, people are afraid of it. This could become a marketing > problem indeed. Making a privacy-violating practice more transparent is a privacy gain. We should be happy that we are bringing this practice to the general consciousness. > There are two options that lead to the same question: > > - If people don't want this feature, you'll have to provide a switch to > turn it off. > > - If it can be switched off, websites will use the old, hidden ways to > track users. > > What's the benefit of the ping attribute then? I do not believe the second step above. There is ample reason to believe that authors are willing to honour their users' preferences. On Fri, 20 Jan 2006, Alexey Feldgendler wrote: > > I think that websites will use the old methods of tracking anyway > because they can't be sure that the user agent supports ping="". Certainly for the forseeable future this will be the case, but that's the case with all new features. On Fri, 20 Jan 2006, Daniel Veditz wrote: > > Can't you say the same about cookies? Many people are up in arms about > "tracking" and browsers do provide blocking tools, yet the vast majority > of people leave them on and very few sites bother to use the old way of > tracking state by passing it as URL query parameters. Indeed. On Fri, 20 Jan 2006, Alexey Feldgendler wrote: > > That's because on most sites you're unable to log in with cookies > disabled. On the other hand, a user who has disabled ping="" doesn't > lose anything. I do not believe that your reasoning applies to most users. On Sat, 27 Oct 2007, Julian Reschke wrote: > > > > We're long past that. It's trivial for a page to trigger a POST > > without the user knowing. > > I consider that a bug in User Agents. This is not a widely held opinion. > Please do not add more of this. While I understand that you believe that silent POSTs are somehow harmful, I believe that on the balance the proposed feature is a net user benefit, and that this instance of automatic POST is no more dangerous than other automatic POSTs being proposed (e.g. in the cross-site XMLHttpRequest specification being developed at the W3C in the WebAPI WG). Indeed, in this instance I would argue the danger is significantly reduced, since no POST data is sent with the request. [Quoting HTTP:] > "9.1.1 Safe Methods > > Implementors should be aware that the software represents the user in > their interactions over the Internet, and should be careful to allow the > user to be aware of any actions they might take which may have an > unexpected significance to themselves or others. I agree that ping="" should be made visible to users. Indeed, the spec explicitly makes that a SHOULD, going far outside its usual boundary of not specifying user interface requirements. > In particular, the convention has been established that the GET and HEAD > methods SHOULD NOT have the significance of taking an action other than > retrieval. These methods ought to be considered "safe". This allows user > agents to represent other methods, such as POST, PUT and DELETE, in a > special way, so that the user is made aware of the fact that a possibly > unsafe action is being requested. Indeed; and in fact part of the goal here is to make the possibly unsafe action (user tracking and conversion tracking, with the potential effect on future performance or the potential material financial effect) be one that can be explicitly brought to the user's attention if he so desires, something that is not possible in legacy tracking techniques. (For example, using redirects make the whole process very opaque.) > Naturally, it is not possible to ensure that the server does not > generate side-effects as a result of performing a GET request; in fact, > some dynamic resources consider that a feature. The important > distinction here is that the user did not request the side-effects, so > therefore cannot be held accountable for them." I think it's clear that this paragraph is trying to convey that having side-effects with a GET request is a poor state of affairs, which I agree with, and which is one of the other things that the ping="" proposal attempts to address -- legacy tracking mechanisms typically abuse GET in an unsafe way, which causes a number of problems for the server (mostly around unpredictable caching effects like pre-caching, session history navigation, and transparent cache proxies), which can then affect the user in undesirable ways (e.g. if tracking is used to determine preference towards one link or another, and the user's browser precaches one more often than the other, then the server will act as if the user had indicated a preference where in fact he had not). In conclusion I think HTTP supports the design of the feature as is. > > > > What do you think are the risks of using POST? > > > > > > There are some, as with GET/HEAD. Due to the nature of POST, the > > > effects may be more grave -- as you just cited ("charging money"). > > > > You are suggesting using GET to make money change hands. If we go that > > route, then what's the difference between GET and POST? You can't > > simultaneously say that POST is dangerous because it lets X happen and > > then say we should therefore use GET to let X happen. That's > > inconsistent. > > No, I'm not suggesting that. > > In this scenario, there are three parties involved: > > A: the user > B: the visited site > C: the site being linked to > > If the link from B to C needs to be audited for the purpose of paying > ads, money will be exchanged between the owners of B and C. A is not > involved in that transaction. > > How the contract between B and C is implemented should be outside the > scope of the stuff sent to A. While that would be nice in practice, it is not the case today, and it is not clear that it ever could be the case. We have to work within the limitations we are presented with, and in this case it seems that the ping="" proposal is the closest one can get to solving the problems seen by both the users and the authors. > Again: > > "The important distinction here is that the user did not request the > side-effects, so therefore cannot be held accountable for them." > > When A follows the link, he is *not* accountable for the cost of the ad, > being transferred from C and B. The HTTP specification just says that a user can never be held accountable for GET side-effects. It says nothing about the user being held accountable for anything else, including automatic POST requests. > > > I personally think that the attribute in itself is a Very Bad Idea, > > > but if it stays in, by all means do not use POST for it. > > > > We can't use GET... what other method would be appropriate? > > You shouldn't do it at all. If you insist in doing it, use a safe > method. Everything else is in conflict with RFC2616. And yes, you can > use GET. I think it has been explained why using GET is undesirable. > > > BTW: I just checked, and the Google Ads on www.google.de work with > > > GET and a Redirect (302). Only safe methods from the user's point of > > > view. Are you saying this is a problem? > > > > Yes. > > Interesting -- good that I asked. It seems we'll not be able to make > progress on this unless we clarify this issue first. The problems from the server-side are that it is unreliable (due to pre-caching, transparent caching, and session history navigation), it obfuscates the user experience (the actual target URL is hidden), it is slow (there's at least one extra HTTP round-trip, possibly with an additional DNS hit as well), and it uses an idempotent method for a distinctly non-idempotent action. > > > > > The spec continues with: > > > > > > > > > > "When the ping attribute is present, user agents should clearly > > > > > indicate to the user that following the hyperlink will also > > > > > cause secondary requests to be sent in the background, possibly > > > > > including listing the actual target URIs." > > > > > > > > > > This is good, but it's probably not clear enough -- at least FF3 > > > > > is ignoring this. > > > > > > > > It's not clear to me how to make it clearer. > > > > > > You could say "must" instead of "should", for a start. You could > > > also propose one acceptable way to fulfill that requirement. > > > > We shouldn't use "MUST" for issues that don't result in > > interoperability problems, that just weakens the other "MUST"s. > > I think we should use "must" for things that affect privacy. We shouldn't really even be using "SHOULD" for this. "MUST" is simply inappropriate here. This isn't to say that the privacy issue is unimportant, it is -- indeed it is one of the primary motivators for making this explicit instead of requiring authors to develop their own proprietary mechanisms as they currently do. However, it is inappropriate use of RFC2119 terminology to make a MUST-level requirement around user interface decisions. > > > > In the case of Firefox 3, the developers were very aware of the > > > > above requirement, as well as its implications, and intentionally > > > > decided to violate the SHOULD for the time being. It isn't clear > > > > to me that there is anything I could do to the _spec_ to change > > > > their mind. (It's not like they just missed the above paragraph or > > > > didn't understand it.) > > > > > > Well, they ignored it, yet made the functionality the default. It's > > > a very clear signal that we have a problem here. > > > > We could remove the paragraph. > > Which makes things even worse. If you can't make this work correctly, > please consider removing it. It has been considered, very seriously, and for a long time. However, the net benefit to the users outweighs the disadvantages. I encourage you to take up the UI issue directly with the relevant browser vendor(s). It isn't a technical problem with the spec. On Sat, 27 Oct 2007, Geoffrey Sneddon wrote: > > Having read this entire thread, I don't see why anything is actually > wrong. In this context the difference between GET and POST is negligible > — both can technically be used to do what is desired, though using GET > would be breaking RFC 2616 (or rather, breaking a SHOULD NOT). If we > disallow it to be used on external servers, people will just continue to > use Javascript to achieve this, which CANNOT be disabled by a UA without > breaking behaviour that sites rely upon. Indeed. On Sat, 27 Oct 2007, Julian Reschke wrote: > > No, sorry, that's incorrect. > > If you want to do something silently (without the user's consent), you > simply have to use a safe method. We don't want to do it without the user's consent. The whole point of making ping="" explicit is to allow the user to have the final decision. > And if you consider the desired effect non-safe (which I don't), then > the consequence is that you just can't do it. We can't stop tracking from occurring. We can, however, make it better for users. I think we have a responsibility to do so. On Sun, 28 Oct 2007, Henri Sivonen wrote: > > The ping attribute does have the same security risks that cross-domain > XHR POST with empty entity body would have if the access-control > Method-Check weren't there. That is, if a POST handler has been > programmed to trigger stuff on mere POST without a body, a malicious > ping attribute could be used to trigger that action. (As could an empty scripted <form>.) > > And if you consider the desired effect non-safe (which I don't), then > > the consequence is that you just can't do it. > > It is about idempotent vs. non-idempotent and side effects. > > If you are counting ad impressions, clearly you don't want to > a) count Google Web Accelerator (or similar) prefetches > b) leave impressions uncounted due to an intermediate cache satisfying > the request. Indeed. On Sun, 28 Oct 2007, Julian Reschke wrote: > > > > So would you ban XHR POST and script-initiated form submissions? > > I would want the XHR spec to clarify that it's not OK to initiate unsafe > methods without the user's consent. I would also deprecate > script-initiated form submissions from something like onload(). Please bring this up with the Web API working group. > > It is about idempotent vs. non-idempotent and side effects. > > > > If you are counting ad impressions, clearly you don't want to > > a) count Google Web Accelerator (or similar) prefetches > > b) leave impressions uncounted due to an intermediate cache > > satisfying the request. > > Yes. But the same problem can (and is) already used without "ping" Indeed, and that's one of the (less important) things we're trying to fix with ping="". > and even if you use "ping", you still could do it with a safe method > (HEAD/Cache-Control:no-cache). Unfortunately HEAD is typically implemented in servers (e.g. Apache) without running the relevant CGI scripts, which makes them hard to implement at all. I also disagree that this would be a correct application of the HEAD method's semantics. On Sun, 28 Oct 2007, Henri Sivonen wrote: > > That might work and could be a tad safer. It isn't in any way > theoretically pure from the RFC 2616 point of view, though, to make HEAD > and GET have different semantics beyond the response body presence. Indeed. On Sun, 28 Oct 2007, Julian Reschke wrote: > > > > That might work and could be a tad safer. It isn't in any way > > theoretically pure from the RFC 2616 point of view, though, to make > > HEAD and GET have different semantics beyond the response body > > presence. > > I wasn't suggesting that. You suggested that we should use HEAD for request tracking, which indeed makes GET and HEAD have different semantics in a way that does not match (at least my interpretation of) RFC2616. On Sun, 28 Oct 2007, Charles McCathieNevile wrote: > > Indeed. What we are being asked to implement is a platform for people to > make money or to keep a closer watch than ever on users. Actually, no, ping="" would help do the opposite. > Fundamentally, the ping being sent is not a user request of any kind at > all, it is a third-party request for information about what the user is > doing. This is not a transaction between a server and a client in the > sense that HTTP usually offers, it is a one-way message from the client > to a third party. So we are just using HTTP as a transport method of > convenience since it is there. This is probably reasonable in the > circumstances, but I don't yet understand how it matters which method we > decide to turn into a one-way message in the absence of a mechanism for > such. Hopefully the points put forward earlier in this e-mail cover this in sufficient detail. > > I think we should use "must" for things that affect privacy. > > Actually, much as I care about security and privacy, I think that in > both these areas we ought to use "should" or similar language. If a > browser decides to violate some policy, there is generally a reason for > it (offer functionality to the user, or satisfy some corporate desire, > implement something better, ...) and I don't think that *this* > specification is the appropriate place to set security and privacy > policy for all users for the web. HTML 5 might describe the behaviour > that this ping should have. But browsers should be free to turn it off > and on, or leave it off, or leave it on, or leave it up to the user... I agree. On Sun, 28 Oct 2007, Julian Reschke wrote: > > Understood. But right now we have one UA (FF3) approaching release, and > the FF developers decided to ignore the "should". This means that they > are either incompetent (I don't believe that), that they implemented > something better (they didn't AFAIK) or that the spec can't be > implemented. That is a problem. There are other options too, like that they are still experimenting with the UI, or that they had higher priorities, or that (as is the case) they are trying to find a solution that is better than the proposed solution of putting the ping domains in the status bar. On Sun, 28 Oct 2007, Charles McCathieNevile wrote: > > You mean POST, right? As far as I am concerned, the HEAD request > suggestion is the least departure from normal HTTP (since there is > already llttle expectation that HEAD will pass a response to the user), > but I still don't see (I'm not sure if you meant to stop here or not.) HEAD seems even less desirable that head from the point of view of HTTP -- it's only supposed to get the HTTP headers of the resource, without doing anything at all! > We are effectively redefining a method, if we follow the current draft, > to remove the expectation that it is a real transaction. I do not believe this is the case, given the existence of silent POSTs as mentioned already in this draft. > While we are defining some magic handling of a method to make it > "technically safe" (nothing bad should happen to the user in the UA as a > result, although there is an argument that what we are doing is socially > not reasonable - and that depends of course on your legal and moral > framework and expectations about privay and other such difficult > concepts) the argument that it should be one thing or another because of > RFC 2616 seems to me specious, because I don't understand what is > proposed as being what is offered in that spec - a way to perform direct > transactions of information. I'm not sure I follow this sentence. On Sun, 28 Oct 2007, Boris Zbarsky wrote: > > You missed option (d): The functionality was implemented by one > (non-UI) guy as a test for an alpha release, no one's thought much about > it since, and now that this thread has reminded people of it it might > get turned off for Fx3. > > For what it's worth, I haven't thought of a sane UI for notifying the > user of "ping" yet. But then again, I'm not a UI designer by trade. Actually we did consider UI at the time (I was involved in the discussions). I would be interested in hearing details about the idea I suggested above, namely of putting the domain names of the hosts to be pinged in brackets after the link's own URI in the status bar: http://www.example.com/foo/bar (tracked by example.net) On Mon, 29 Oct 2007, Julian Reschke wrote: > > So the scenario is: > > 1) User A browses web site B. > > 2) A follows an HTML link to site C. > > 3) The owner of B wants to be informed of that event in order to charge > the owner of C for an online ad linking to C. That's one scenario; there are other, possibly more important ones, for example: tracking results in search, so that more popular entries can have subsequent rankings boosted, or usability studies tracking which links users prefer on a site. > As far as I understand Ian, he thinks that the notification for step 2 > needs to be done through an unsafe method, because money may be > exchanged. That's one reason; there are others, as mentioned earlier in this post. > My position is that although money may be exchanged between B and C due > to the notification (ping), this is a transaction between B and C, and A > MUST NOT be involved. In other words, following a hyperlink MUST stay > "safe" in the RFC2616 sense. The hyperlink does stay safe; however, the ping is not idempotent, and should not use an idempotent method. > Quoting RFC2616, 9.1.1 again: [...] See above for a detailed response to each quoted paragraph. > (emphasis on the last paragraph!) The last paragraph actually doesn't apply -- it gives reasons not to use GET, or to be careful with GET, and doesn't actually give advice on other methods. On Sun, 28 Oct 2007, Roy T. Fielding wrote: > > Aside from all of the other issues, my vote would be to remove the ping > attribute from the specification. It is not a desirable feature. While I understand this reaction, it should be noted that user tracking is widely practiced. It is currently done in ways that are significantly less user-friendly. I think we owe it to users to improve matters here. Simply burying our heads in the sand about this issue doesn't help users. > It is not sufficient for accurate user tracking (mandatory in the realm > of referral payments) It's as accurate if not more than the current mechanisms, and user preferences shouldn't take second place to accuracy in referral payments. While some companies may find greed overtakes their desire to address their users' concerns, I would hope this is not the norm on the ever more social-focused Web. > would never be implemented consistently in practice Why not? > is trivial to defeat What do you mean by "defeat"? > is trivial to use for a DoS attack How so? > or mass fraud on the referral provider Obviously the same signing mechanisms would have to be used with financial-related ping=""s as are done today with ad tracking, but I don't see what would make ping="" any less secure in this context. > and is completely redundant to the current features provided by HTTP > (cookies and referer) I don't understand how either cookies or referers could be used here. Could you elaborate? That's certainly not how tracking is more commonly done today. > and HTML (any embedded request). I don't see how it is redundant with HTML; Web authors have repeatedly said that they have trouble making tracking systems that do not mangle the URL or slow down the request. On Sun, 28 Oct 2007, Kornel Lesinski wrote: > > OTOH ping is all about creating side-effects, and only non-safe methods > should cause them. Indeed. > I too was initially shocked that this might be a "CSRF-heaven", but after a > second thought I think changing method won't noticeably improve security: > > * any website can automatically POST a form using script to any other > website, and this can't be blocked without breaking lots of legitimate > websites (sadly the worst offenders are banks providing easy-to- > integrate payment gateways) > > * any website can trick user into clicking submit button that sends POST > to another site (image buttons or CSS can disguise button as a link) > > * sites trying to filter-out any unsafe HTML from user input can either > do it right (only allow known safe elements, attributes and their > values) or are doomed to fail. There are sooo mind-numbingly many ways > for injecting scripts (http://ha.ckers.org/xss.html) that <a ping> isn't > even interesting. > > * ping doesn't allow sending any payload. This severly limits usefulness > of it for CSRF attacks and makes it easy for websites to protect against > it. > > Therefore I think using GET or HEAD for <a ping> won't make a difference > -- in every case where <a ping> could be abused, other mechanism could > be abused easier and with better (for attacker) results. > > The root cause why using POST is unsafe is CSRF, and there should be a > separate effort dealing with that (covering all cases, not only ping). I agree with all of the above. On Sun, 28 Oct 2007, Julian Reschke wrote: > > Following a link should not cause side effects the user (A) can be made > accountable for. Agreed. And nothing says that the user can be made accountable for POSTs made for ping="" attributes. Just that the user _can't_ ever be made accountable for side-effects made in response to GETs. > And, fortunately this is not the case here. The only party for which the > side effect is relevant is the site owner (B), and potentially the party > (C) the link points to. And sometimes the user, e.g. when the tracking is used to improve search results in future searches, or to personalise a site to the user's habbits by promoting areas of a site that the user uses the most. I hope this clarifies the issues surrounding the post="" attribute. I understand that not everybody agrees on this, but when there are requests that are mutually exclusive, we can't make everyone happy. I hope that the explanations above address most of the concerns that were raised, but I understand that they might not. I would ask anyone who still disagrees with what the spec says to please consider the above explanations carefully; simply raising the same issue that has already been raised, with no new information or reasoning, is unlikely to result in a different reply. I try to base the design of the spec on the balance of all input, not on the volume of input. Cheers, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 2 November 2007 22:06:16 UTC