Re: Feedback on the ping="" attribute (ISSUE-1) from Julian Reschke on 2007-11-03 (public-html@w3.org from November 2007)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sat, 03 Nov 2007 10:38:19 +0100
To: Ian Hickson <ian@hixie.ch>
CC: HTML WG List <public-html@w3.org>
Message-ID: <472C418B.3000200@gmx.de>
Ian Hickson wrote:
> 
> (Julian: I've only sent this to the public-html list, since that seems to 
> be where you are mostly posting these days.)

Side note: I'm not sure what the point is to post to two mailing lists, 
but to only want responses on one of them.

> I have reordered this discussion and cut down the post to reduce the 
> repetition. Hopefully I haven't skipped any points you considered key.
> 
> On Sat, 3 Nov 2007, Julian Reschke wrote:
>> Could you please clarify why the ping attribute wouldn't work equally 
>> well with a safe method?
> 
> It's not a "safe" vs "unsafe" discussion. There's nothing intrinsically 
> unsafe (in an HTTP sense) about the POST in question here. The question is 

Whether a specific invocation of POST is safe or not does not depend on 
the HTML5 spec, but on the server that implements that resource.

If a commercial site exposes a resource that, when being POSTed to (with 
an empty body), finishes a shopping transaction that will certainly not 
be "safe".

> one of "idempotent" vs "non-idempotent". The problems with using a 
> non-idempotent request are that idempotent methods aren't supposed to have 
> side-effects, whereas the whole _point_ of this request is a side-effect; 

...a side effect the user shouldn't be made accountable for...

> and that idempotent methods can get cached, whereas caching a ping would 
> misrepresent the user's actions and defeat the point of the ping.

Are you talking about caching *requests* or *responses*?

I have stated several times that you can ensure that GET/HEAD bypasses 
caches by adding the proper Cache-Control headers, so you can make them 
work just as POST.

> Repeating the ping would cause side-effects. Thus it is not an idempotent 
> action. Thus it should use a non-idempotent method to follow HTTP 
> semantics.

I'm not sure who's supposed to do the repeating. Are you thinking of 
HTTP stacks that repeat idempotent requests if they didn't get a response?

That concern could be addressed by making sure that accessing the same 
ping URI again could be detected, for example by adding a sequence or 
random number to it.

>>> The HTTP specification just says that a user can never be held 
>>> accountable for GET side-effects. It says nothing about the user being 
>>> held accountable for anything else, including automatic POST requests.
>> If the UA decides to invoke an unsafe method *without* the user's 
>> consent, that *may* be a problem. With a safe method, it's guaranteed 
>> not to be. Thus, there's a clear advantage in using a safe method.
> 
> There are definite problems with using an idempotent method. Using a 
> non-idempotent method has only theoretical dangers ("that *may* be a 
> problem"). As an editor I have a responsibility to treat actual problems 
> as more important than theoretical ones.

So please clarify what that actual problem is.

And I'm not sure what you're up to with the statement about the editor's 
role -- it's up to the working group to decide, isn't it?

>> So you are willing to state that ping-initiated HTTP method invocations 
>> must not cause an action the user can be made accountable for. I agree 
>> with that.
>>
>> But then, why don't you use a safe method in the first place?
> 
> Because the "safe" methods are idempotent, and the semantic we are trying 
> to convey here has one goal and one goal only, and that goal is 
> specifically _not_ idempotent.

Safe method are idempotent. I'm saying, that the action of following a 
link MUST be safe from the user's point of view (unless, well, it get's 
a different UI).

So, claiming it's safe for the user, but requiring to use a 
non-idempotent method doesn't compute for me.

> I think you are reading too much into the terms "safe" and "unsafe" in the 
> HTTP specification insofar as their application to specific methods. The 
> whole point of unsafe methods is that they are about things that are done 
> on the user's behalf that could affect the user in some undesirable way, 
> whereas the entire purpose of the ping="" attribute is to not affect the 
> user directly. Thus, _whatever_ method is used here, the net result is 
> "safe" in an HTTP sense.

If it is "safe" in the HTTP sense, you should use a safe method.

>>> That's one scenario; there are other, possibly more important ones, 
>>> for example: tracking results in search, so that more popular entries 
>>> can have subsequent rankings boosted, or usability studies tracking 
>>> which links users prefer on a site.
>> Understood. I didn't mention them here, because it seems you were mainly 
>> concerned about the ad issue. In *this* case, there's even less reason 
>> to use an unsafe method.
> 
> Actually I'd have thought the opposite was true -- in the two cases that 
> don't involve advertising, the tracking _does_ affect the user, and 
> presumably if one considers the safe vs unsafe distinction relevant here, 
> one would thus find it more important to use an unsafe method (and also 
> that it works more reliably than can be achieved with GET requests).

OK, so now it's unsafe after all?

A site can already easily track how I navigate *inside* it by using the 
Referer header. That's completely OK and has nothing to do with being 
"unsafe".

The link auditing only becomes useful for

- navigation events that do not roundtrip to the server (following 
internal links),
- links to other sites,
- cases where the user has decided to block Referer headers.

In the latter case I would expect that ping would be disabled as well.

So I'm really not sure what the use case for intra-site link auditing is 
that isn't already covered by existing features. Please elaborate.


>>> The hyperlink does stay safe; however, the ping is not idempotent, and 
>>> should not use an idempotent method.
>> I still do not understand why it needs to be unsafe.
> 
> It doesn't need to be unsafe, it needs to be non-idempotent. That is, it 
> needs to be a method that that caches and user agents will not consider 
> replayable, reusable, etc.

I still do not understand why a UA would consider something replayable 
that was found in the ping attribute, if the spec states it isn't.

If the concern are lower parts of the stack, this can be solved by 
careful URI design.

If the concern is malicious replaying of pings -- you can't avoid that 
by using POST instead of GET/HEAD.

>>>> (emphasis on the last paragraph!)
>>> The last paragraph actually doesn't apply -- it gives reasons not to 
>>> use GET, or to be careful with GET, and doesn't actually give advice 
>>> on other methods.
>> I do not understand how it "does not" apply, and I also disagree that it 
>> gives reasons not use GET. On the contrary, it explicitly allows servers 
>> to do something with GET that has side-effects -- as long as the user is 
>> not made accountable for it. Exactly this case, it seems to me.
> 
> It applies to GET, sure. Not to POST.

It gives advice on safe methods (GET), stating you can use them instead 
of unsafe methods (such as POST). So of course it applies to this 
discussion.

>> And again: it depends on who is made accountable for the side effect. 
>> The user following the link shouldn't be.
> 
> POST doesn't mean the user is accountable.

Oh yes, it does in general.

"Implementors should be aware that the software represents the user in 
their interactions over the Internet, and should be careful to allow the 
user to be aware of any actions they might take which may have an 
unexpected significance to themselves or others." -- 
<http://tools.ietf.org/html/rfc2616#section-9.1.1>

A server that receives a POST request has no way to decide how that 
method invocation was initiated; the fact that a/@ping caused it is not 
visible in the request.

> (On why we can't use HEAD:)
>>> Unfortunately HEAD is typically implemented in servers (e.g. Apache) 
>>> without running the relevant CGI scripts, which makes them hard to 
>>> implement at all. I also disagree that this would be a correct 
>>> application of the HEAD method's semantics.
>> HEAD and GET have the same semantics - the only difference being that 
>> for HEAD the response body is not transmitted. Servers that implement 
>> HEAD differently technically are not compliant.
> 
> Sadly, the realities of actual implementations are more important than the 
> theory of what is conforming.

Ian, I have stated several times that the distinction does not matter.

>> For link tracking, my understanding was that there is no response body 
>> expected. Thus, for a server that implements a "link auditing resource", 
>> both GET and HEAD actually will do the same -- invoke some kind of 
>> tracking (minimally dumping the URI into a log file), and just return 
>> with an HTTP 2xx status and no body.
>>
>> Thus, I would expect that GET and HEAD can be used interchangeably.
> 
> Using either GET or HEAD explicitly hoping for them to have side-effects 
> seem very much counter to the intent of those methods. I also disagree 
> that the two are equivalent. One is asking for a copy of a resource, the 
> other is asking for the metadata about the resource. Using HEAD for 

No, one is asking for the representation plus metadata, the other one 
just for the metadata. Exactly what I said:

"The HEAD method is identical to GET except that the server MUST NOT 
return a message-body in the response. The metainformation contained in 
the HTTP headers in response to a HEAD request SHOULD be identical to 
the information sent in response to a GET request." -- 
<http://tools.ietf.org/html/rfc2616#section-9.4>

So again, if the representation for a "ping URI" is always empty, GET 
and HEAD will behave exactly the same.

> something other than just getting the headers seems especially against the 
> spirit of the HTTP specification.

Logging GET requests would be against the same "spirit", and yet it is 
being done all the time *and* explicitly allowed by section 9.1.1.


>> Just state in the spec that the GET/HEAD operation on the ping target 
>> MUST happen at most one time per user-initiated navigation to the href'd 
>> URI.
> 
> That wouldn't help with caches and the like, though. (And sadly, in the 
> wild, no-cache isn't always reliable.)

Well, Cache-Control: No-Cache is part of HTTP/1.1. If it doesn't work in 
practice, please raise it as an HTTP issue.

If somebody is concerned about intermediates getting this wrong, it's 
trivial to workaround by making sure URIs do not repeat.

> ...

Best regards, Julian
Received on Saturday, 3 November 2007 09:38:39 UTC