[whatwg] <a href="" ping=""> from Ian Hickson on 2005-10-23 (public-whatwg-archive@w3.org from October 2005)

From: Ian Hickson <ian@hixie.ch>
Date: Sun, 23 Oct 2005 23:26:55 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0510231912460.23945@dhalsim.dreamhost.com>
On Fri, 21 Oct 2005, S. Mike Dierken wrote:
> > 
> > Bearing the above in mind, I've added a section to the <a> element 
> > that describes a ping="" attribute. The URIs given in this attribute 
> > would be followed when the user clicks the link, thus getting around 
> > the problems listed above.
>
> The term 'ping' in terms of RSS/blogs often means to POST some data, but 
> in this case, it would be a GET request - that may get confusing.

It would be a POST request, so I think that's ok.


> Perhaps 'also-get' or 'snoop-href' something like that.

"ping" seems clearer than those two, especially since it is a POST 
request. But I'm not tied to the name if people prefer another one.


> Ideally, the data within the tags would be able to have more than one anchor
> and each would have different roles, but I don't think HTML supports that
> (except for the <link> elements in the <head> section since they apply to
> the containing document). For example: 
>  <a href='...' rel='default'><a href='...' rel='snoop'>this uses nested
> anchors - which are illegal</a></a>
> (nested anchors are illegal I know...
> http://www.w3.org/TR/REC-html40/struct/links.html#h-12.2.2)

Yeah this seems like over-engineering the solution. :-)


On Sat, 22 Oct 2005, dolphinling wrote:
> > 
> > In my experience, "they" are ok with it being separate, as it conveys 
> > a number of benefits to the user. (I would consider my source on this 
> > matter reasonably authoritative.)
> 
> Hmm... perhaps your source could explain his reasoning here? :) It's 
> extremely easy to make non-circumventable tracking, and I assumed that 
> most times it _was_ circumventable were due to ignorance rather than an 
> informed decision. To me, it seems, the benefits to an advertising 
> company of doing so outweigh the benefits of not.

The reasoning is simple. Screwing the customer for a short-term gain is 
never to the benefit of the advertising company.


On Sat, 22 Oct 2005, dolphinling wrote:
> 
> | User agents should allow the user to disable this behaviour, for
> | example controlled by a setting that also disables the sending of HTTP
> | Referrer headers. If the behaviour has been disabled, UAs may either
> | ignore the ping attribute altogether, or selectively ignore URIs in
> | the list based on the user's preference (e.g. ignoring any third-party
> | URIs).
> 
> This sounds slightly wrong to me. "If it's disabled, UAs may choose to 
> in fact only disable part of it."

Yeah I should reword that. Fixed.


> | For URIs that are HTTP URIs, the requests must be performed using the
> | POST method. User agents must ignore the entity body returned, but
> | must honour the HTTP headers  in particular, HTTP cookie headers.
> 
> Should be changed to "must ignore any entity bodies returned", to deal 
> with redirects.

Fixed.


> What happens if I do ping="ftp://ftp.example.org/file"? Would the UA download
> that file? Perhaps something like:
> 
> | Authors should use only HTTP URIs. User agents may ignore any non-HTTP
> | URIs.
> 
> is in order?

Well the remote server needs to be contacted, for sure. Since the entity 
body is ignored it's up the UA whether the whole download happens or not 
(from a user's point of view, you can't really tell the difference, and 
from a server's point of view, it doesn't really matter).


> | When the ping  attribute is present, user agents should clearly
> | indicate to the user that following the hyperlink will also cause
> | secondary requests to be sent in the background, possibly including
> | listing the actual target URIs.
> 
> Is this necessary? I would leave it entirely up to the UAs rather than saying
> they SHOULD do it.

It's a privacy concern, so I think a SHOULD is in order, no? Exactly how 
it is done is up to the UA, but security and privacy concerns should IMHO 
be taken into account unless there are very good reasons not to (which is 
what SHOULD means).


> | Note: ... but authors are urged to use the ping attribute so that the
> | user agent can .
> 
> Perhaps this sentence should be ended? :)

Oops! Fixed.


On Fri, 21 Oct 2005, S. Mike Dierken wrote:
> 
> Will the user-agent represent these links in a special way, so the user 
> is made aware of the fact that a possibly unsafe action is being 
> requested?

I'm not sure what you mean by "potentially unsafe", but yes, the whole 
point is that the UA would be able to take this ping="" attribute and 
provide better UI.

BTW, bear in mind that it is quite possible for scripts to send arbitary 
POST requests to arbitrary servers with arbitrary data today, silently, 
without the user knowing or having any visible UI (this has in fact been 
possible for years). This isn't a particularily high security risk (sites 
are expected to check things like Referer headers and use unpredictable 
confirmation tokens in user confirmation pages).


> > I would say it's OK to send a POST as a side effect because it's going 
> > to an URL where the developer expects a POST.
>
> But that's not what the user clicking the link expects.

I doubt the user clicking the link expects much of anything except getting 
Free Downloads Now! or whatever the advert says.


On Fri, 21 Oct 2005, S. Mike Dierken wrote:
> > 
> > GET means that you can do it again without affecting anything. In the 
> > case of tracking, you can't -- the very act of contacting that 
> > tracking URI can cost someone money. Hence POST. (This is another 
> > advantage of ping over redirects, come to think of it.)
>
> Since it isn't costing the /user/ any money, aren't those server 
> side-effects immaterial?

I don't think "immaterial" is the word I would use. It would fail to serve 
the full purpose of the pinging if the pinging were to be cached by 
intranet proxies!


On Fri, 21 Oct 2005, S. Mike Dierken wrote:
> 
> Since this is effectively capturing where the user's attention is being
> spent (the click event I mean), should you also define the other set of
> events of interest as well?
>  <a href="..." on-click-notify="myattention.org/dierken"
> on-hover-notify="myattention.org/dierken"
> on-copy-notify="myattention.org/dierken">Wicked Cool Stuff Here</a>

Well we could, but demand seems to be primarily for just knowing about 
clicks, not anything else.


> What is the request method for these notifications (the wording "the URIs
> would be followed" imply retrieval)?

POST for HTTP, as described in a later paragraph.


> If POST, what is the content body?

Empty. (Fixed so that spec is explicit.)


> Should the Referer request header also be sent (except for documents
> retrieved via secure protocols)?

Sure, but that's out of scope for this spec.


> Should the notification event occur before, during or after the retrieval of
> the href="..." resource?

Undefined, up to the UA. Clarified.


> Should the notification event occur for only succesful retrievals? Or 
> should the notification contain the response status of incomplete 
> retrievals of the href="..." resource?

Independent. Clarified.


> Should the notification URIs be restricted to the same host/domain as a) the
> source document b) the href="..." resource or c) unlimited?

Unlimited.


On Sat, 22 Oct 2005, ROBO Design wrote:
> >
> > [script]
>
> I wouldn't encourage 'ugly' hacks even before finishing the 
> specification :). For this to work people also have to rely on JS.

It's possible to do it backwards as well (though the script would be a bit 
longer!). In practice, there will always be hacks to make new things work 
in older browsers, until those older browsers are not important enough. 
The key is ensuring that it is possible to do that, as opposed to having 
the new features cause the older browsers to break completely.


On Sat, 22 Oct 2005, ROBO Design wrote:
> > 
> > Can you list some? Other than redirects, I couldn't actually think of 
> > any reliable ones. XMLHttpRequests dispatched in tandem with the 
> > original request are unreliable since you can't guarentee which order 
> > the requests are sent in (and thus some clicks might get lost). What 
> > else is there?
> 
> XMLHttpRequest in syncronized mode should do the trick. Just wait until 
> the request is done, then go to the URL. It's easy, it can be done 
> today.

Sure, but then that's no better than redirects, from a user perspective.


On Sat, 22 Oct 2005, Lachlan Hunt wrote:
> > 
> >  1. Improving sites, by getting data regarding how users use the site.
> >  ...
> >  3. Improving services, e.g. by offering a number of options, checking
> > which the user picked, and making that one be the first on the list     the
> > next time the user uses the service.
>
> How are those two any different from each other, except that #3 gave a 
> specific example of how the data could be used.  Anyway, why would that 
> require an extra ping?  Couldn't that be determined just from which 
> pages they access in the site/service?

I supposed they're not that different. I meant one to be human-controlled 
and the other to be computer-controlled.

You can't necessarily get that data from logs because the idea may be to 
have multiple links to the same place and see which one the user uses, or 
to have links to other sites and see which one the user uses.


> It could be defined in reverse, where the ping attribute (probably given 
> a more suitable name, but I'll use ping for now) could be advisory 
> information about the final destination and the href attribute defines 
> the ping destination, such that following the href attribute would 
> perform a redirect, but WA1 UAs could use the URI in the ping attribute 
> to notify the user of the final destination (such as displaying it in 
> the status bar).

In practice this kind of thing would just be unreliably used. It's a kind 
of metadata. Metadata is notorious for being wrong.


On Sat, 22 Oct 2005, ROBO Design wrote:
> > 
> > The suggestion for this originally came to me from Web advertisers, so 
> > I'm not sure this is necessarily true.
> 
> I wouldn't have expected *this* to be suggested by web advertisers. 
> Maybe they have some twisted ideas that I don't think of now :).

Or they're just trying to make the user experience better. :-)


> True. Users who want to avoid advertisments and tracking methods just 
> use specialized software for this (like ad blocker, ad muncher and many 
> more). Yet, these are very few users.

Right.


> I still see it as a 'security' or privacy issue to allow ping URLs to 
> various third-party servers because it makes it easier for other sneaky 
> developers to add their own ping URLs to other server. I'm talking about 
> developers who will be doing UserJSs of the future, even proxy servers 
> with content filtering, etc. I know what you are thinking of: how many 
> will write UserJSs and how many witll use them? Expect in the future 
> some really interesting web-viruses, thanks to the advent of web 
> applications (not only this specification, I'm talking in general).

I don't understand the problem here.


> They will be able to do this tracking cross-domain, very easily and 
> simple. Not like now, because now cross-domain tracking of user 
> behaviour is harder.

It's actually easier to do that today than with ping="". You just put a 
capture listener on the root element, and for every click targetted at a 
link, you dispatch a sync XMLHttpRequest before letting the event's 
default action take place. That's a dozen lines of code, maybe.

If they _did_ use ping="", the user would at least know what was going on!


On Sat, 22 Oct 2005, Mike Dierken wrote:
> 
> How about not putting this notification URI in the anchors at all - what 
> about putting some metadata in the <head> element that indicates that 
> /all/ links clicked should send a notification to the indicated service?
> 
>  <link href="http://myattention.org/dierken" rev="attention-tracker" />

There are three main reasons; one is that the tracking might have to be 
done by multiple companies on a per-link basis, another is that the 
tracking might only apply to certain parts of the page, and the third is 
that for compatibility with existing systems each link needs its own 
tracking URI.


Thanks everyone for your input on this thread so far!

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Sunday, 23 October 2005 16:26:55 UTC