- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 5 Aug 2009 01:03:57 +0000 (UTC)
On Thu, 30 Jul 2009, Kartikaya Gupta wrote: > > It seems that most browsers do some sort of newline and tab removal from > URI attributes. For example, if you have > > <img src="foo > bar.jpg"> > > browsers will still render the image called "foobar.jpg" despite the > CRLF pair in the middle of the src attribute. The behavior actually > seems a bit more complex; quote from one of my co-workers who > investigated this: > > > <img id='bar' width="288" height="48" foo="abc > > def" src="http://m.theglobeandmail.com/image- > > server/img//rO0ABXQAS2Z7aHR0cDovL2JldGEuaW1hZ2VzLnRoZWdsb2JlYW5kbWFpbC5jb20vaW1hZ2VzL21v > > YmlsZS9nYW1fZmxhZy5wbmd9dDBmMjg4dA==.png" alt="img" /> > > > > <script type="text/javascript"> > > alert( document.getElementById('bar').getAttribute('src').indexOf('\n') ); > > alert( document.getElementById('bar').src.indexOf('\n') ); > > </script> > > > > Firefox and Sarafi both generate two alerts, 36 and -1. > > > > It seems mozilla ignores 0x09, 0x0a, 0x0d in the URI > > Whereas webkit seems to ignore 0x09, 0x0a, 0x0d in the path. > > > > Try putting a CRLF inside the authority and > > alert( document.getElementById('bar').src.indexOf('\n') ); > > > > will return non -1 in safari. But will still fetch the image. Firefox seems to return -1 all the time. > > > > Opera is like firefox. > > This behavior doesn't seem to be specced anywhere as far as I can tell. > Assuming the WEBADDRESSES spec referred to in HTML5 is the one at > http://www.w3.org/html/wg/href/draft.html that only says to trim > leading/trailing whitespace and url-encode the rest. This doesn't seem > to match existing behavior, so it should probably be updated. I'll forward this e-mail to Larry, who is working on the relevant spec now. > On a related note, I was wondering if all these "spin-off" specs could > be listed somewhere easy to find; it took me a while to locate the web > addresses one and I had to use google to find it. Putting a list at, > say, http://www.whatwg.org/specs/ would be handy; or even better, the > references section in the HTML5 spec could list them. The references section will in due course; in the meantime, please feel free to construct such a list on the wiki if that would be of help. On Thu, 30 Jul 2009, Anne van Kesteren wrote: > > Any chance you could also check whether this applies to CSS, > XMLHttpRequest, HTTP Location, etc.? So for I've found that browsers use > the same URL processor everywhere (though sometimes the URL character > encoding flag is set to UTF-8 and cannot be changed). As such it would > be nice to know if that is still true here or whether this is a > pre-processing step specific to HTML attribute values. Looks like yes, at least for CSS: <!DOCTYPE html><style>body { background: url("ima\Age"); }</style>X ...results in a background. On Thu, 30 Jul 2009, Philip Taylor wrote: > > We should attempt to maintain compatibility with existing content, and > whitespace in URI attributes seems very common in existing content, > e.g.: > > http://www.topdogphotos.com/photo-gallery/gallery11.html (newlines in > <a href>, <img src>) > > http://www.sprig.com/coyuchi_george_or_thor_hooded_baby_towel (tabs > and 
 in <img src>) > > and loads more. Thanks for looking into this. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 4 August 2009 18:03:57 UTC