[whatwg] Stripping newlines from URI attributes

It seems that most browsers do some sort of newline and tab removal from URI attributes. For example, if you have

<img src="foo
bar.jpg">

browsers will still render the image called "foobar.jpg" despite the CRLF pair in the middle of the src attribute. The behavior actually seems a bit more complex; quote from one of my co-workers who investigated this:

> <img id='bar' width="288" height="48" foo="abc
> def" src="http://m.theglobeandmail.com/image-
> server/img//rO0ABXQAS2Z7aHR0cDovL2JldGEuaW1hZ2VzLnRoZWdsb2JlYW5kbWFpbC5jb20vaW1hZ2VzL21v
> YmlsZS9nYW1fZmxhZy5wbmd9dDBmMjg4dA==.png" alt="img" />
>  
> <script type="text/javascript"> 
> alert( document.getElementById('bar').getAttribute('src').indexOf('\n') );
> alert( document.getElementById('bar').src.indexOf('\n') );
> </script>
>  
> Firefox and Sarafi both generate two alerts, 36 and -1.
> 
> It seems mozilla ignores 0x09, 0x0a, 0x0d in the URI
> Whereas webkit seems to ignore 0x09, 0x0a, 0x0d in the path.
> 
> Try putting a CRLF inside the authority and
> alert( document.getElementById('bar').src.indexOf('\n') );
> 
> will return non -1 in safari. But will still fetch the image. Firefox seems to return -1 all the time.
> 
> Opera is like firefox. 

This behavior doesn't seem to be specced anywhere as far as I can tell. Assuming the WEBADDRESSES spec referred to in HTML5 is the one at http://www.w3.org/html/wg/href/draft.html that only says to trim leading/trailing whitespace and url-encode the rest. This doesn't seem to match existing behavior, so it should probably be updated.

On a related note, I was wondering if all these "spin-off" specs could be listed somewhere easy to find; it took me a while to locate the web addresses one and I had to use google to find it. Putting a list at, say, http://www.whatwg.org/specs/ would be handy; or even better, the references section in the HTML5 spec could list them.

Thanks,
kats

Received on Wednesday, 29 July 2009 17:49:01 UTC