Re: Feedback on the ping="" attribute (ISSUE-1) from Ian Hickson on 2007-11-07 (public-html@w3.org from November 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 7 Nov 2007 16:48:46 +0000 (UTC)
To: "Roy T. Fielding" <fielding@gbiv.com>, Mark Baker <distobj@acm.org>, Boris Zbarsky <bzbarsky@MIT.EDU>
Cc: HTML WG List <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0711071527280.30809@hixie.dreamhostps.com>
On Tue, 6 Nov 2007, Roy T. Fielding wrote:
> 
> Not really.  The actions generated by a user agent should be consistent 
> with the actions selected by the user.  That is why TimBL had an axiom 
> about GET being safe -- clicking on a link (or a spider wandering 
> around) must be translated into a safe network action because to do 
> otherwise would require every user to know the purpose of every resource 
> before the GET.  It follows, therefore, that the UI for a user action 
> that is safe (a link) must be rendered differently from all other 
> actions that might be unsafe.

Do you disagree that a ping="" is safe?


> In short, if the UI is being presented as a normal link, then the HTTP 
> methods resulting from the user's selection must all be safe 
> (GET/HEAD/OPTIONS).

Do you believe that it is the method that makes a message safe, rather 
than the message's own processing model?


> The discussion on ping assumes that the ping target is expecting to 
> receive empty-body POST requests (i.e., that the target has not been 
> deliberately supplied to fool an unsuspecting user into triggering a 
> non-safe action when they select the link). But that is an invalid 
> assumption -- the target of the ping could be any URI, including those 
> that do fun things like delete wiki pages or print documents or send 
> mail ... we've been through this all before and not all of them require 
> bodies.  That's why HTTP and HTML both have requirements on use of safe 
> methods.

But it's already trivial to send arbitrary POSTs on the user's behalf to 
arbitrary hosts, with arbitrary bodies, even. Why should we try to patch a 
crack in the barn roof when the barn door is wide open with nobody even 
considering closing it?

It seems very much like a theoretical concern.


> In any case, I still see no reason for this attribute to exist. I am 
> well aware of how link tracking works and the entire history of the user 
> tracking industry in Web protocols (due to a recent patent case), and 
> you haven't even reached the most minimal requirements that a real site 
> would need for tracking referrals

Could you elaborate on what features you believe are needed to make this 
usable? We really would like to make the privacy situation better, but if, 
as you say, it's not meeting the most minimal requirements, then we should 
reconsider the design and address those needs. What are those needs?


> and would never be capable of proving undercounts

It's not clear to me what you mean by being able to prove undercounts, or 
why that is a requirement. Any system that provides a 


> [the sole apparent reason for this new feature]

The ping="" attribute is intended to help users. Right now, user tracking 
happens widely, but suffers from a number of problems, including being:

 * non-transparent (users can't see what's being pinged easily)
 * non-optional (users can't disable it)
 * slow (adding DNS and TCP roundtrip to every tracked request)
 * obfuscated (the final target is usually hard to determine)

ping="" is intended to solve all of these problems: with it we want to 
make this more transparent, to uphold user privacy desires, to make 
everything faster, and to improve the clarity of what's going on.

It also tries to help the authoring side by making the tracking cleaner 
and making the user experience better (we've already had a number of 
authors say they would use it as soon as it was widely available, some 
even said they'd use it earlier on a per-browser basis!).


> because there is no guarantee that the two DNS requests will deliver 
> equally reachable servers for the ping and href, nor that the href 
> request will succeed before the ping succeeds, nor that the href URI 
> corresponds to the ping-per-referral URI.  It is for all of those 
> reasons that people use redirects, referer, and cookies today and those 
> will never be solved by ping.

Could you elaborate on this? This proposal was driven by desires from the 
authoring community; if we're not addressing authoring needs, we should 
fix the spec.


> A solution to that problem, if one exists, needs to be vetted by people 
> at companies that do referral tracking and payments in real life, not as 
> a hacking exercise in cool features, and for that you will need to talk 
> directly to the right people at Google, Amazon, Linkshare, and at least 
> a few of the retailers that are aware of all the ways in which tracking 
> can be abused.

I work for Google, and we have input from other groups; naturally, input 
from anyone else who wishes to comment is more than welcome.


> Even if such a ping was standardized, it would be years before a 
> sufficient number of deployed browsers were out there to make it work, 
> and during that time the content providers would have to do both 
> redirects and pings to get their numbers.

This applies to many things in the HTML5 spec; this is a project that will 
last decades. That's ok.


On Tue, 6 Nov 2007, Mark Baker wrote:
> > >
> > > http://lists.w3.org/Archives/Public/www-tag/2002Apr/0207.html
> >
> > The above e-mail seems to imply that a message in HTTP is "safe" if it 
> > causes no loss of property for the user (with a loose definition of 
> > property here).
> 
> No, what Roy cited there was just a subject-independent definition of 
> safety.  When applied to the server implementation, it would mean what 
> you describe.  But as HTTP is a protocol, that's not what it means.

Ok, so what does "safe" mean in this context? Can you define it for us?


> > Note, though, that it seems that the danger is not in doing "safe" 
> > things with "unsafe" methods, but with doing "unsafe" things with 
> > "safe" methods. That is, doing "unsafe" work (work which can cause 
> > loss of property) is bad when you're using GET or HEAD -- but doing 
> > "safe" work as part of a message with "POST" is harmless. As far as I 
> > can tell.
> 
> None of what you describe there is "bad".  If a server wants to change 
> something when it receives a GET, it is completely free to do so. What 
> it cannot do though, is blame the user for that change because they 
> didn't ask for it (as the message the user agent sent was safe).

I agree. I'm not proposing using GET, though. The question is why is POST 
not possible, not why is GET possible. I understand that you believe GET 
is possible (though, as I have noted, I believe it has problems that you 
have not addressed).


> "idempotent", for a message, means that any series of identical messages 
> means the same as just one.

Indeed. And the whole point here is that multiple ping="" messages do not 
mean the same as just one.


On Tue, 6 Nov 2007, Boris Zbarsky wrote:
> 
> Can we agree that for ping a series of identical messages is NOT in fact 
> the same as just one?  And that therefore ping is not idempotent?

On Wed, 7 Nov 2007, Mark Baker wrote:
>
> Hmm, no.  I agree that the server won't behave idempotently, but the 
> ping messages themselves have to be idempotent and safe because they 
> came to be as a result of a user clicking a link.

It might be a requirement that the messages be idempotent... but that 
doesn't make them idempotent. It might be a requirement we can't satisfy.


On Wed, 7 Nov 2007, Boris Zbarsky wrote:
>
> OK.  Either I seriously overestimated our level of common ground or 
> you're disagreeing just for he sake of disagreeing.  I don't see what 
> the "clicking a link" part has to do with the question I posed.  I'm 
> just looking at the messages on the protocol level.  As defined and 
> meant to be used, the ping messages are not idempotent.  Most trivially, 
> the number of pings matters; in fact that's the whole purpose of ping.
> 
> Now if your claim is that clicking on a link must never result in 
> non-idempotent messages being sent to the server, that's a separate 
> issue. We'll get to discussing that once we've established some sort of 
> common ground.  For now, can we agree that as currently defined the ping 
> messages are not idempotent?  And leave what they "have" to be out of 
> the discussion?
> 
> If you still disagree, I'd like to know why, in detail, without words 
> like "should", "have to be", "ought to be" thrown in.  Let's stick to 
> "are".

On Wed, 7 Nov 2007, Mark Baker wrote:
> 
> No, it's not a separate issue at all.  In fact it's the only concern 
> that matters.  Messages sent by user agents must reflect the intent of 
> the user.
> 
> Since we're looking for common ground, as I mentioned before, I agree 
> that a pinged server will probably behave non-idempotently.  But that's 
> orthogonal to the meaning of the message it receives.

It is? What's the point of the message having meaning if it is just 
theoretical and doesn't reflect what actually happens?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 7 November 2007 16:49:01 UTC