URI tokenization and tabs

CSS3's tokenization seems different from CSS2.1 in one regard that has me
puzzled, and browsers seem to implement CSS2.1's version.

The character sequence

   'u' 'r' 'l' '(' '\t' 'x' '\t' ')'

seems to be a valid URI token in both, but in CSS2.1 that character
sequence is semantically equivalent to url("x") while in CSS3 it is
equivalent to url("x\t") because
http://www.w3.org/TR/css3-syntax/#tokenization says urlchar admits TAB:

    urlchar::=[#x9#x21#x23-#x26#x27-#x7E] | nonascii | escape
    ...
    URI::="url(" w (string | urlchar* ) w ")"

This differs from 2.1's

    url ([!#$%&*-~]|{nonascii}|{escape})*

Since urlchar allows a tab, the tab branch in the second (w) production
will never be taken under PEG-semantics as specified in the prologue:

    In case of multiple matches, the longest match determines the token.

What was the reason for allowing embedded tabs in URLs (but not at the
start)?
Might moving the #x9 from urlchar to stringchar be preferable?

http://lists.w3.org/Archives/Public/www-style/2008Mar/0000.html does raise
a related question but I can't find any resolution.

cheers,
mike



Incidentally, I ran the below on a few browsers I had at hand, and they
don't seem to be preserving the white-space in the URL at the end.

<!doctype html>
<body>
<pre style="background-image:url( x )" id="tabs">
Hello
</pre>
</body>
<script>(function () {
  // Courtesy quirksmode.org/dom/getstyles.html
  function getStyle(el,styleProp) {
    if (el.currentStyle) {
      return el.currentStyle[styleProp];
    } else if (window.getComputedStyle) {
      return document.defaultView.getComputedStyle(el,null)
        .getPropertyValue(styleProp);
    }
  }

  alert('[' + getStyle(document.getElementById('tabs'), 'background-image')
+ ']');
}());
</script>

Received on Friday, 22 March 2013 22:08:49 UTC