Re: a/@ping discussion (ISSUE-1 and ISSUE-2), was: An HTML language specification vs. a browser specification from Julian Reschke on 2008-11-24 (public-html@w3.org from November 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Mon, 24 Nov 2008 12:56:19 +0100
To: Ian Hickson <ian@hixie.ch>
CC: "Roy T. Fielding" <fielding@gbiv.com>, HTML WG <public-html@w3.org>
Message-ID: <492A9663.80301@gmx.de>
Ian Hickson wrote:
>> No, that wouldn't help, as it's supposed to *replace* those, not extend 
>> them.
> 
> Then I don't understand what you want.

Roy stated that ping should be in a separate proposal:

RF> I don't care how long ping has been under consideration by WHATWG
RF> mailing lists, nor do I care how many fanboys have thought in the past
RF> that it is worth implementing. It represents a change to HTML (a 
harmful
RF> one at that). Place it on the block and let it fight for itself in 
terms
RF> of implementation. It should be a separate proposal until it has been
RF> successfully implemented by two independent implementations. Likewise
RF> for all of the other new additions.

I think it's clear that we he meant was "separate from the HTML5 spec as 
of today".

>> (I note that there are browsers that do not display the link target 
>> itself in the status bar)
> 
> Indeed, different browsers have different user interfaces, and so 
> different solutions make sense for different browsers. This is why the 
> specification doesn't say anything about UI.

Not displaying the target of a link is considered a security issue 
nowadays. I hope that at some point some of the specs we're working on 
will have a Security Considerations section pointing that out.

The point I was trying to make was that one of the "four big" UAs 
doesn't even get *this* right, so I have my doubts that the situation 
will be any better with a/@ping.

>> So, what has been the feedback on these proposals? Where did the 
>> discussion occur?
> 
> Feedback has been generally positive. These discussions mostly occured in 
> person and in private e-mail with individual browser vendor engineers.

Too bad.

>> So let's assume for a second that national regulations (in a country 
>> with significant population) would force a vendor to ship with this 
>> feature disabled. Would it still be used for link tracking in practice?
> 
> Assuming that Web pages intend to follow local laws, then sure.

I wasn't talking about *pages*, but browsers.

>>>> img/@src causes GET requests, while a/@ping causes POST requests.
>>> Ok, then use <form>. ping="" is as easy to trigger as a form 
>>> submission.
>> Not with scripting disabled, right? (yes, I use the FF noscript 
>> extension).
> 
> With forms today it is as easy to trigger a POST as it will be to trigger 
> a ping="", with the additional problem that with the POST you can include 
> any arbitrary payload. Both can be done without scripting. For example, 
> looks at demo 2 here:
> 
>    http://damowmow.com/playground/demos/http/
> 
> I've already shown this to you:
> 
>    http://lists.w3.org/Archives/Public/public-html/2007Nov/0086.html

Yes, we had this discussion already.

Just because something bad can be done already doesn't mean it's a good 
idea to add more ways to do it.

>>> It's a common use case. The ad publisher, the ad provider (and click 
>>> tracker) and the ad target are commonly three different companies.
>> Why can't the site hosting the document do the link notification?
> 
> I don't understand the question.
> 
> If bloomberg.com buys ad inventory from google.com, and the ad Google 
> provides is a link to the economist.com, then when the ad is clicked, 
> Google needs to be notified (so it can charge the economist.com and pay 
> bloomberg.com), and the user needs to be redirected to the economist.com. 
> Currently, this is done with redirects, akin to:
> 
>    <a href="http://example.google.com/ads/12345678901234567890">...</a>
> 
> The user clicks the link, Google records the click, and redirects the user 
> to the economist.com. However, in this scenario, the user has no way to 
> opt-out of the tracking, or even to know that tracking will occur, and no 
> way to see where the link will really lead.
> 
> To solve this, with ping="", we could have:
> 
>    <a href="http://economist.com/ad1"
>       ping="http://example.google.com/ads/12345678901234567890">...</a>
> 
> I don't see how bloomberg.com could do the link notification. It certainly 
> doesn't seem like something that a publisher would be interested in doing.

Of course the site hosting the ad also has interest in tracking the 
link, for instance, to keep a record of clicks for comparison.

>> Nice strategy :-) By saying nothing has consensus, and consensus isn't 
>> relevant, it's of course simple to argue that controversial stuff should 
>> stay in.
> 
> Whether something is controversial or not certainly shouldn't be a concern 
> as to whether it stays in or not. (It's not really clear to me how you 
> would objectively decide that something is controversial, either.)

Well, I think it should be a concern.

>> So, avoiding the term "consensus"... This feature is much more 
>> controversial than many other new features.
> 
> Oh my, no, not at all. We've had far, far more complaints about, say, 
> headers="", or <img alt="">, or the video codecs issue. (The latter in 
> particular has triggered orders of magnitude more feedback than ping="" 
> ever has. We probably got more new subscribers out of the codecs debacle 
> than we've ever received e-mails total on ping="".)

"...than many other new features...".

I didn't say there are other things that are more controversial.

Also, there's a difference between a new feature being added as opposed 
to an HTML4 feature being changed (you couldn't take out the HTML4 
feature, all you could do is take out the change from HTML4).


>> HEAD/GET would work when used with the proper Cache headers (and yes, 
>> this was discussed before).
> 
> And as discussed before, HEAD/GET don't satisfy the requirements that we 
> have (which include, basically, "don't use GET"). We need a method that 

That's not a requirement.

> proxies aren't going to replay, that has no caching problems (setting 
> cache headers often doesn't work, sadly), and that isn't the default, so 
> that people visiting the URL in a browser won't trigger the logging 
> behaviour that pinging would.

Do you have evidence of proxies not respecting Cache-Control: no-cache? 
And, if yes, would it affect the accuracy sufficiently?

Furthermore: preventing fraud for click tracking is a hard problem. 
Using POST instead of GET won't help anyway with respect to this.

The problem of people visiting the target URL can be dealt with in 
several ways; for instance by having an additional request header (which 
you happen to have already).

>> This also was discussed before; you *could* use a new method, and you do 
>> not need the HTTPbis working group for that. You should try to get IETF 
>> approval though.
> 
> Since you are the one who believes that using POST is a problem, and since 
> you are apparently better versed in these matters than me, could you 
> please do the honours? If we had a new method, I would be happy to use it. 

I do believe that we shouldn't spend our time working on this, so, no, I 
won't do that.

If you want to do it, I'll be happy to assist with the process. 
Basically, the best way is to describe the method (syntax + semantics) 
in a separate document, and get IESG approval for it.

> ...
>>>> - the way it's implemented over HTTP is problematic
>>> Suggestions on improving it are welcome. I'm trying to do the best I 
>>> can given HTTP's limitations.
>> Lots of suggestions have been made already, such as
>>
>> - not doing it at all,
> 
> Hardly an improvement, since it doesn't address any of the use cases.

There is no agreement that this use case is something HTML5 needs to solve.

> ...
>> - when using POST, at least make the message self-descriptive by using a 
>> body + well-defined MIME type, or
> 
> I've changed the spec to include an entity body with a new MIME type.

That's a step into the right direction. It would be better if the 
payload actually was in the body instead of new headers.

> ...
>> I would recommend to put these proposals as examples of potential 
>> implementations into the spec, so people can review them and comment on 
>> them.
> 
> As noted above, the specification is not an appropriate place for user 
> interface discussions.

In general, yes. In this case however, this affects privacy, and the 
spec already makes a requirement on the UI.

> ...
>> Where does that leave us with respect to future extensibility?
> 
> I would assume that all extensions to HTML would be done in future 
> revisions of the HTML spec (HTML6, HTML7, etc) just as it has been since 
> the start. Experimentation with doing things in a more modular fashion, in 
> particular Ruby, RFFa, and ARIA, have proved somewhat problematic. (Which 
> isn't to say that those features aren't good ideas, just that having them 
> be defined in separate documents has resulted in far more trouble than we 
> would have had if they had been defined as part of the core language from 
> the start.)
> ...

I disagree with this, but that's a separate discussion.


>> Furthermore, while we're at it:
>>
>> "For URLs that are HTTP URLs, the requests must be performed by fetching 
>> the specified URLs using the POST method, with an empty entity body in 
>> the request...."
>>
>> This text is very misleading. You don't "fetch" URLs.
> 
> The word "fetch" is a term defined by HTML5 (hence why it is hyperlinked 
> in that sentence).

I understand that, but that doesn't change the fact that it's misleading.

>> Also, POST isn't a retrieval method, it operates on the specified URI.
>> The result *can* be a response (which doesn't need to represent the 
>> resource in any way), plus, optionally, a pointer to a separate resource 
>> from which a representation than can be fetched (Location header + 3xx).
> 
> How does this affect the text quoted above?

I was trying to explain why the term "fetch" in the context of POST is 
misleading.

BR, Julian
Received on Monday, 24 November 2008 11:57:00 UTC