Re: Whitespace

On Tuesday 2008-06-10 18:54 +0000, Ian Hickson wrote:
> For consistency in the Web platform I would like us to make the whitespace 
> definitions for HTML5 and CSS match. Right now, HTML5 defines the 
> following characters to be syntactic whitespace:
> 
>    U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), 
>    U+000B LINE TABULATION, U+000C FORM FEED (FF), and U+000D CARRIAGE 
>    RETURN (CR)
>    http://www.whatwg.org/specs/web-apps/current-work/#space
> 
> CSS2.1 defines the following characters to be syntactic whitespace:
> 
>    "space" (U+0020), "tab" (U+0009), "line feed" (U+000A), "carriage 
>    return" (U+000D), and "form feed" (U+000C) 
> 
> The only difference appears to be the inclusion of U+000B in the 
> definition for HTML5.

So, I was going to propose a change yesterday, but not a change this
big.  I was just going to propose changing the definition of
whitespace for ~= selectors (and class selectors) to match HTML5,
since those selectors are intended to match HTML.

But now I've reconsidered.  There's a *lot* of data in:
  https://bugzilla.mozilla.org/show_bug.cgi?id=437915

I'm strongly opposed to changing the CSS definition of whitespace
that's been stable for ten years and is reliably implemented across
browsers.  See Gecko and Webkit behavior on:
https://bugzilla.mozilla.org/attachment.cgi?id=324389
https://bugzilla.mozilla.org/attachment.cgi?id=324515

> HTML5's definition has a couple of minor advantages: it seems to be 
> closers to what IE7 does (at least for HTML), and it allows spaces to be 
> defined as the range of characters from U+0009 to U+000D plus U+0020, 
> rather than having it be five separate codepoints, which may allow for 
> some subtle optimisations.

IE7's behavior is so wacky that it's nearly impossible to tell what
it does in CSS, since its CSS parser recovers from errors very
aggressively.

> Would adding U+000B to the CSS white space definition be acceptable to the 
> CSSWG, or are there good reasons to exclude U+000B that should cause me to 
> remove it from the HTML5 definition?

I think it should just be removed from the HTML5 definition.

On Tuesday 2008-06-10 20:30 +0000, Linss, Peter wrote:
> FWIW Gecko accepts U+000B as whitespace (and likely has since the
> beginning).

No it doesn't.  See results on:
https://bugzilla.mozilla.org/attachment.cgi?id=324389
https://bugzilla.mozilla.org/attachment.cgi?id=324515

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/

Received on Tuesday, 10 June 2008 21:03:54 UTC