Re: img issue: should we restrict the URI from Ian Hickson on 2008-12-24 (public-html@w3.org from December 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 24 Dec 2008 11:04:57 +0000 (UTC)
To: Boris Zbarsky <bzbarsky@MIT.EDU>
Cc: Christian Schmidt <w3.org@chsc.dk>, HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0812241044550.24109@hixie.dreamhostps.com>

On Tue, 2 Dec 2008, Boris Zbarsky wrote:
> Ian Hickson wrote:
> > On Wed, 13 Aug 2008, Christian Schmidt wrote:
> > > Christian Schmidt wrote:
> > > > It may be an idea to disallow the URL consisting of the empty string,
> > > > i.e. <img src="">.
> > > FWIW Firefox now ignores <img src=...> when src is a reference to the
> > > containing document: https://bugzilla.mozilla.org/show_bug.cgi?id=444931
> > 
> > On Wed, 13 Aug 2008, Boris Zbarsky wrote:
> > > No, it ignores <img src=""> when the base URI for the image node is the
> > > document URI (which isn't quite the same thing as what you said).
> > 
> > What Christian said appears to be more accurate:
> > 
> > http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%3Cbase%20href%3D%22image%22%3E%3Cimg%20src%3D%22%22%3E
> 
> Gecko doesn't allow relative URIs in @href on <html:base> (per HTML4, though
> HTML5 changes this), so this test isn't testing what it thinks it's testing,
> as far as I can see.

Ah, I see. Ok.


> > I don't understand why we would define things this way though. If the server
> > wants to return different files each time, and return an image once and a
> > document another time and a style sheet a third time, why not?
> 
> Basically because:
> 
> 1)  Doing that seems like an abuse of the HTTP content-negotiation
>     feature (possibly not conforming to HTTP on the part of the server,
>     though this is debatable).
> 2)  In practice no one does that.
> 3)  In practice sites somewhat commonly have <img src="">.  We (Gecko)
>     have had 28 independent bug reports filed (with people bothering to
>     create an account in the bug database, etc) about the behavior
>     difference from IE here.  That's a much larger number of bug
>     reports than we usually get about a given issue.  I can't tell you
>     why this pattern is so common (e.g. whether some authoring
>     frameworks produce it in some cases), but it seems that a number
>     of web developers not only produce markup like this but notice
>     the requests in their HTTP logs and file bugs about it.
> 4)  The performance implications on high-latency networks (e.g.
>     cell-phone networks) of dealing with this sort of markup are
>     not that pretty, at least in Gecko.
> 
> I should note that we did _not_ make a similar change for 
> |background-image: url()| in CSS, at least in part because we've had 
> many fewer reports about it (3 or 4).  I do see the whole thing as a 
> hack, and would have been more strongly opposed to doing anything 
> special here (and was for a long time) if not for point 3 above and the 
> combination of 1 and 2...  Point 4 was just the impetus for someone 
> actually writing a patch.

On Tue, 2 Dec 2008, Philip Taylor wrote:
> 
> Out of 104879 pages with at least one <img src>, from my collection of 
> pages from dmoz.org, there are 529 (0.5%) with at least one empty <img 
> src="">.
> 
> I don't see any obvious pattern in those pages - there's a mixture of 
> old and new pages, dynamic and static pages, hand-written and various 
> generators, etc. So it doesn't appear to be the result of a single tool.

Based primarily on #2 above and on Philip's research, I've made the spec 
say to ignore <img src=""> if the base URI of the element is the same as 
the document's address.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 24 December 2008 11:05:35 UTC