Re: Feedback on the ping="" attribute (ISSUE-1) from Ian Hickson on 2007-11-03 (public-html@w3.org from November 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Sat, 3 Nov 2007 06:11:08 +0000 (UTC)
To: Julian Reschke <julian.reschke@gmx.de>
Cc: HTML WG List <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0711030013460.27205@hixie.dreamhostps.com>
(Julian: I've only sent this to the public-html list, since that seems to 
be where you are mostly posting these days.)

I have reordered this discussion and cut down the post to reduce the 
repetition. Hopefully I haven't skipped any points you considered key.

On Sat, 3 Nov 2007, Julian Reschke wrote:
>
> Could you please clarify why the ping attribute wouldn't work equally 
> well with a safe method?

It's not a "safe" vs "unsafe" discussion. There's nothing intrinsically 
unsafe (in an HTTP sense) about the POST in question here. The question is 
one of "idempotent" vs "non-idempotent". The problems with using a 
non-idempotent request are that idempotent methods aren't supposed to have 
side-effects, whereas the whole _point_ of this request is a side-effect; 
and that idempotent methods can get cached, whereas caching a ping would 
misrepresent the user's actions and defeat the point of the ping.

Repeating the ping would cause side-effects. Thus it is not an idempotent 
action. Thus it should use a non-idempotent method to follow HTTP 
semantics.


> > The HTTP specification just says that a user can never be held 
> > accountable for GET side-effects. It says nothing about the user being 
> > held accountable for anything else, including automatic POST requests.
> 
> If the UA decides to invoke an unsafe method *without* the user's 
> consent, that *may* be a problem. With a safe method, it's guaranteed 
> not to be. Thus, there's a clear advantage in using a safe method.

There are definite problems with using an idempotent method. Using a 
non-idempotent method has only theoretical dangers ("that *may* be a 
problem"). As an editor I have a responsibility to treat actual problems 
as more important than theoretical ones.


> So you are willing to state that ping-initiated HTTP method invocations 
> must not cause an action the user can be made accountable for. I agree 
> with that.
> 
> But then, why don't you use a safe method in the first place?

Because the "safe" methods are idempotent, and the semantic we are trying 
to convey here has one goal and one goal only, and that goal is 
specifically _not_ idempotent.

I think you are reading too much into the terms "safe" and "unsafe" in the 
HTTP specification insofar as their application to specific methods. The 
whole point of unsafe methods is that they are about things that are done 
on the user's behalf that could affect the user in some undesirable way, 
whereas the entire purpose of the ping="" attribute is to not affect the 
user directly. Thus, _whatever_ method is used here, the net result is 
"safe" in an HTTP sense.


> > That's one scenario; there are other, possibly more important ones, 
> > for example: tracking results in search, so that more popular entries 
> > can have subsequent rankings boosted, or usability studies tracking 
> > which links users prefer on a site.
> 
> Understood. I didn't mention them here, because it seems you were mainly 
> concerned about the ad issue. In *this* case, there's even less reason 
> to use an unsafe method.

Actually I'd have thought the opposite was true -- in the two cases that 
don't involve advertising, the tracking _does_ affect the user, and 
presumably if one considers the safe vs unsafe distinction relevant here, 
one would thus find it more important to use an unsafe method (and also 
that it works more reliably than can be achieved with GET requests).


> > The hyperlink does stay safe; however, the ping is not idempotent, and 
> > should not use an idempotent method.
> 
> I still do not understand why it needs to be unsafe.

It doesn't need to be unsafe, it needs to be non-idempotent. That is, it 
needs to be a method that that caches and user agents will not consider 
replayable, reusable, etc.


> You seem to be concerned about the ping being executed when the user 
> *didn't* navigate -- but what does this have to do with safe vs unsafe?

Nothing. It has everything to do with idempotent vs non-idempotent.


> > > (emphasis on the last paragraph!)
> > 
> > The last paragraph actually doesn't apply -- it gives reasons not to 
> > use GET, or to be careful with GET, and doesn't actually give advice 
> > on other methods.
> 
> I do not understand how it "does not" apply, and I also disagree that it 
> gives reasons not use GET. On the contrary, it explicitly allows servers 
> to do something with GET that has side-effects -- as long as the user is 
> not made accountable for it. Exactly this case, it seems to me.

It applies to GET, sure. Not to POST.


> And again: it depends on who is made accountable for the side effect. 
> The user following the link shouldn't be.

POST doesn't mean the user is accountable.


(On why we can't use HEAD:)
> > Unfortunately HEAD is typically implemented in servers (e.g. Apache) 
> > without running the relevant CGI scripts, which makes them hard to 
> > implement at all. I also disagree that this would be a correct 
> > application of the HEAD method's semantics.
> 
> HEAD and GET have the same semantics - the only difference being that 
> for HEAD the response body is not transmitted. Servers that implement 
> HEAD differently technically are not compliant.

Sadly, the realities of actual implementations are more important than the 
theory of what is conforming.


> For link tracking, my understanding was that there is no response body 
> expected. Thus, for a server that implements a "link auditing resource", 
> both GET and HEAD actually will do the same -- invoke some kind of 
> tracking (minimally dumping the URI into a log file), and just return 
> with an HTTP 2xx status and no body.
> 
> Thus, I would expect that GET and HEAD can be used interchangeably.

Using either GET or HEAD explicitly hoping for them to have side-effects 
seem very much counter to the intent of those methods. I also disagree 
that the two are equivalent. One is asking for a copy of a resource, the 
other is asking for the metadata about the resource. Using HEAD for 
something other than just getting the headers seems especially against the 
spirit of the HTTP specification.


> Just state in the spec that the GET/HEAD operation on the ping target 
> MUST happen at most one time per user-initiated navigation to the href'd 
> URI.

That wouldn't help with caches and the like, though. (And sadly, in the 
wild, no-cache isn't always reliable.)


> > > And, fortunately this is not the case here. The only party for which 
> > > the side effect is relevant is the site owner (B), and potentially 
> > > the party (C) the link points to.
> > 
> > And sometimes the user, e.g. when the tracking is used to improve 
> > search results in future searches, or to personalise a site to the 
> > user's habbits by promoting areas of a site that the user uses the 
> > most.
> 
> All of these are cases where HTTP experts will tell you that GET or HEAD 
> is just fine.

With all due respect, appeal to authority is not very convincing.


(On user interface:)
> > I agree that ping="" should be made visible to users. Indeed, the spec 
> > explicitly makes that a SHOULD, going far outside its usual boundary 
> > of not specifying user interface requirements.
> 
> Currently, the standard way in HTML UAs to distinguish safe (GET) from 
> unsafe (POST) is a link vs a button.
> 
> So yes, if all "audited" links turn into buttons, that concern would be 
> dealt with. Somehow however I feel this is not what people have in mind.

There are ways to make this visible that do not involve changing the 
appearance of the link.


> > Indeed; and in fact part of the goal here is to make the possibly 
> > unsafe action (user tracking and conversion tracking, with the 
> > potential effect on future performance or the potential material 
> > financial effect) be one that can be explicitly brought to the user's 
> > attention if he so desires, something that is not possible in legacy 
> > tracking techniques. (For example, using redirects make the whole 
> > process very opaque.)
> 
> Following that, the spec should make any UA that makes an audited link 
> indistinguishable from a regular link non-conforming.

That is already the case (unless the implementor knows of "valid reasons 
in particular circumstances when the particular behavior is acceptable or 
even useful", noting that "the full implications should be understood and 
the case carefully weighed before implementing").


> > We don't want to do it without the user's consent. The whole point of 
> > making ping="" explicit is to allow the user to have the final 
> > decision.
> 
> Once in the configuration, or on each navigation event? Per site?

Presumably, such user interface details would (like all user interface 
details) be left up to the user agent.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Saturday, 3 November 2007 06:11:22 UTC