Re: img issue: should we restrict the URI from Laurens Holst on 2009-01-09 (public-html@w3.org from January 2009)

From: Laurens Holst <lholst@students.cs.uu.nl>
Date: Fri, 09 Jan 2009 11:20:09 +0100
To: Ian Hickson <ian@hixie.ch>
CC: Boris Zbarsky <bzbarsky@MIT.EDU>, Christian Schmidt <w3.org@chsc.dk>, HTML WG <public-html@w3.org>
Message-ID: <496724D9.9090000@students.cs.uu.nl>
Ian Hickson schreef:
>>> I don't understand why we would define things this way though. If the server
>>> wants to return different files each time, and return an image once and a
>>> document another time and a style sheet a third time, why not?
>>>       
>> Basically because:
>>
>> 1)  Doing that seems like an abuse of the HTTP content-negotiation
>>     feature (possibly not conforming to HTTP on the part of the server,
>>     though this is debatable).
>>     

I disagree with this, imagine URI http://example.com/photos/12345 
pointing to an image of a desired format, and when requested with 
text/html returning a HTML page containing the metadata that is part of 
the JPEG file (author, diaphragm, etc…), referencing the actual image 
with an <img src=""> tag. The URL still means the same, and still 
returns the same data, just in a different format that is more suitable 
to the requestee (as browsers generally don’t show this meta-info by 
default). This seems perfectly valid to me.

>> 2)  In practice no one does that.
>> 3)  In practice sites somewhat commonly have <img src="">.  We (Gecko)
>>     have had 28 independent bug reports filed (with people bothering to
>>     create an account in the bug database, etc) about the behavior
>>     difference from IE here.  That's a much larger number of bug
>>     

Let me just make this clear before this turns into a misconception about 
legacy IE compatibility:

The currently described behaviour is NOT the same as IE.

IE treats <img src=""> as <img src=".">.

>>     reports than we usually get about a given issue.  I can't tell you
>>     why this pattern is so common (e.g. whether some authoring
>>     frameworks produce it in some cases), but it seems that a number
>>     of web developers not only produce markup like this but notice
>>     the requests in their HTTP logs and file bugs about it.
>> 4)  The performance implications on high-latency networks (e.g.
>>     cell-phone networks) of dealing with this sort of markup are
>>     not that pretty, at least in Gecko.
>>     

So the other reasons are ‘practical’ reasons to deal with arguably 
incorrect content that authors have created. They are not reasons that 
browser vendors are forced upon because of legacy IE-compatibility.

>> I should note that we did _not_ make a similar change for 
>> |background-image: url()| in CSS, at least in part because we've had 
>> many fewer reports about it (3 or 4).  I do see the whole thing as a 
>> hack, and would have been more strongly opposed to doing anything 
>> special here (and was for a long time) if not for point 3 above and the 
>> combination of 1 and 2...  Point 4 was just the impetus for someone 
>> actually writing a patch.
>>     
>
> On Tue, 2 Dec 2008, Philip Taylor wrote:
>   
>> Out of 104879 pages with at least one <img src>, from my collection of 
>> pages from dmoz.org, there are 529 (0.5%) with at least one empty <img 
>> src="">.
>>
>> I don't see any obvious pattern in those pages - there's a mixture of 
>> old and new pages, dynamic and static pages, hand-written and various 
>> generators, etc. So it doesn't appear to be the result of a single tool.
>>     
>
> Based primarily on #2 above and on Philip's research, I've made the spec 
> say to ignore <img src=""> if the base URI of the element is the same as 
> the document's address.
>   

Too bad, as that seems yet another unnecessary rule that adds to the 
complexity.

It also makes impossible something that would potentially be useful, if 
IE fixed its behaviour. I have tried to use this a couple of years ago, 
but it didn’t work in IE because IE has the tendency to send Accept: */* 
for both image and page requests. Very oddly, IE sends a different 
Accept header when reloading a page, IE team if you read this: it would 
be nice if you could fix this, and give the Accept header some love! 
(add text/html, send image/* for images, etc.) :).

If you choose to specify this, may I suggest defining it differently: 
make the spec say to ignore images where the (resolved) URI of the 
element matches the document URI? That seems a rule that simplifies the 
logic and where the implementation doesn’t need to perform two checks 
(value = "" && this.baseURI == document.documentURI). Here the explicit 
baseURI check seems a bit odd and unrelated, and working on raw values 
(presuming knowledge about the resolving function) instead of the result 
after resolving the URI.

By the way, for reference, the Bugzilla bug that (recently) changed 
Mozilla’s behaviour:

https://bugzilla.mozilla.org/show_bug.cgi?id=444931

~Laurens

-- 
Note: New email address! Please update your address book.

~~ Ushiko-san! Kimi wa doushite, Ushiko-san nan da!! ~~
Laurens Holst, student, Utrecht University, the Netherlands
Website: www.grauw.nl. Backbase employee; www.backbase.com
Received on Friday, 9 January 2009 10:20:55 UTC