Re: Treating carriage return as white space in layout

On 05/11/2010 06:14 AM, Henri Sivonen wrote:
> Context:
> https://bugzilla.mozilla.org/show_bug.cgi?id=557197
> https://bugzilla.mozilla.org/show_bug.cgi?id=534071
>
> An XML Processor [...] Thus, a carriage return may participate in layout.
>
> The CSS 2.1 'white-space' processing model is oddly inconsistent about the
> treatment of carriage return.
>
> First, http://www.w3.org/TR/CSS21/text.html#white-space-prop says:
> "Newlines in the source can be represented by a carriage return (U+000D),
> a linefeed (U+000A) or both (U+000D U+000A) or by some other mechanism
> that identifies the beginning and end of document segments, such as the
> SGML RECORD-START and RECORD-END tokens. The CSS 'white-space' processing
> model assumes all newlines have been normalized to line feeds."
>
> It's not exactly clear to me if the last sentence is an informative
> statement or a normative statement. As an informative statement it's
> misleading, since both XML and HTML5 parsers can introduce carriage
> returns into the document tree even though such carriage returns weren't
> any kind of line breaks in the source.[...]
>
> CSS3 Text is more vague: http://dev.w3.org/csswg/css3-text/#white-space-processing

CSS3 Text seems pretty clear to me:
   # In the context of CSS, the document white space set is defined to be any
   # space characters (Unicode value U+0020), tab characters (U+0009), or line
   # break characters (defined by the document format: typically line feed,
   # U+000A). Control characters besides the white space characters and the
   # bidi formatting characters (U+202x) are treated as normal characters and
   # rendered according to the same rules.
   #
   # The document parser must normalize line break character sequences according
   # to its own format rules before CSS processing takes effect. However, in
   # generated content strings the line feed character (U+000A) and only the line
   # feed character is considered a line break sequence. For CSS white space
   # processing all line breaks must be normalized to a single character
   # representation—usually the line feed character (U+000A)—here called a
   # "line break".

According to CSS3 Text, carriage returns are not white space characters.
They therefore do not get any special treatment during the white space
collapsing process and are treated the same as any other non-whitespace
control character.

Both CSS3 Text (quoted above) and CSS2.1 (section 16.6.3) say that carriage
returns are treated as characters to render the same as normal characters:
they do not behave as control characters. I assume this means that if
there's a glyph in the font they are rendered as that glyph, otherwise some
substitution process is triggered just as for missing glyphs of other
characters. If that's not what we want for control characters, and what we
want is for the character to definitely disappear, or to definitely fall
back to nothing, then we'll need to adjust both specs to say so.

The only thing I see missing in CSS3 Text is a statement that characters
designated as line breaks cause forced line breaks, which is pretty obvious,
but should be stated clearly somewhere. :)

Is the behavior specced in CSS3 Text what you want, and would backporting
some changes to CSS2.1 to create the same effect solve the problem, or is
there something else you needed here?

~fantasai

Received on Wednesday, 7 July 2010 21:28:03 UTC