Re: [CSS21][CSS3 Text] Re: Treating carriage return as white space in layout

On Sep 15, 2010, at 10:58, fantasai wrote:

> Part I: Add newline normalization to CSS rules.
> 
>  In this paragraph:
> 
>    # Newlines in the source can be represented by a carriage
>    # return (U+000D), a linefeed (U+000A) or both (U+000D U+000A),
>    # or by some other mechanism that identifies the beginning
>    # and end of document segments, such as the SGML RECORD-START
>    # and RECORD-END tokens.

The sentence is a bit too specific to be immediately recognized as a non-normative sentence. How about:
"CSS doesn't restrict how document languages represent line breaks in the source or in the document tree."

> The CSS 'white-space' processing
>    # model assumes all newlines have been normalized to line feeds.
> 
>  Append to the last sentence
> 
>    | UAs that recognize other newline representations must apply
>    | the white space processing rules as if this normalization
>    | has taken place. If no newline rules are specified for the
>    | document language, each carriage return (U+000D) and CRLF
>    | sequence (U+000D U+000A) in the document text is treated as
>    | single line feed character.

I think it would be clearer to say:

"CSS 'white-space' processing must be applied as if each line break in the document tree were represented by a single line feed character. When the document language doesn't define what characters represent line breaks in the document tree, the CRLF pair (U+000D U+000A), a carriage return (U+000D) that is not a part of a CRLF pair and a line feed (U+000A) that isn't part of a CRLF pair are each considered to represent a line break."

(Theoretical point for completeness: This doesn't cover the case where a document language puts LF in the document tree but doesn't want it to be a line break.)

>  In the white space processing rules step 1:
>    # Each tab (U+0009), carriage return (U+000D), or space (U+0020)
>    # character surrounding a linefeed (U+000A) character is removed
>    # if 'white-space' is set to 'normal', 'nowrap', or 'pre-line'.
>  remove the mention of carriage returns.

I'm OK with this assuming that something closely like the wording I suggested above is adopted.

> Part II: Specify handling of carriage returns in generated content.
> 
>  Append to the new paragraph text above:
> 
>  Option A: "Generated content is not affected by newline normalization."
>  Option B: "This default normalization rule also applies to generated content."

My guess is B, but I don't actually know if the same whitespace processing code runs in Gecko for generated and non-generated content, so I defer to someone who knows.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Friday, 17 September 2010 14:29:55 UTC