Re: [CSS21][CSS3 Text] Re: Treating carriage return as white space in layout

On 08/03/2010 06:51 AM, Henri Sivonen wrote:
> On Aug 2, 2010, at 23:23, fantasai wrote:
>
>> "treated as whitespace" is vague. Different kinds of whitespace are treated
>> differently.
>
> My exact requirements are vague. I know that I need CR to behave in a
> whitespace-ish way, but I'm not confident about the details.
>  ...
>>> For white-space-collapse: preserve-breaks;, I'm not totally confident what's
>>> best, but I've been persuaded that Opera 10.60's behavior (CR is a break but
>>> it coalesces with LF when appearing in a CRLF pair) is the thing I should
>>> be wanting.
>>
>> So you want CRLF normalization to happen at the CSS level in addition to the
>> source markup level for text appearing in the DOM,
>
> Yes.
>
>> but not for text in generated content?
>
> I didn't intend to express an opinion about generated content. I don't know
> how exactly code reuse works in implementation between DOM-appearing content
> and generated content, but I don't want to introduce any difficulties in
> that area.

Below are two different ways of treating CR as a line breaking
character in the white space processing rules. This is Take I.

Approach 1: Normative normalization to "line feed character".

   In this paragraph:

     # Newlines in the source can be represented by a carriage
     # return (U+000D), a linefeed (U+000A) or both (U+000D U+000A),
     # or by some other mechanism that identifies the beginning
     # and end of document segments, such as the SGML RECORD-START
     # and RECORD-END tokens. The CSS 'white-space' processing
     # model assumes all newlines have been normalized to line feeds.

   Append to the last sentence

     | UAs that recognize other newline representations must apply
     | the white space processing rules as if this normalization
     | has taken place.
     |
     | In the absence of specific document language rules to the contrary,
     | each carriage return (U+000D) and CRLF sequence (U+000D U+000A) in the
     | source text is treated as single line feed character.

   In the white space processing rules step 1:
     # Each tab (U+0009), carriage return (U+000D), or space (U+0020)
     # character surrounding a linefeed (U+000A) character is removed
     # if 'white-space' is set to 'normal', 'nowrap', or 'pre-line'.
   remove the mention of carriage returns.

Approach 2: Generalizing to "line break character".

   In this paragraph:

     # Newlines in the source can be represented by a carriage
     # return (U+000D), a linefeed (U+000A) or both (U+000D U+000A),
     # or by some other mechanism that identifies the beginning
     # and end of document segments, such as the SGML RECORD-START
     # and RECORD-END tokens. The CSS 'white-space' processing
     # model assumes all newlines have been normalized to line feeds.

   Drop the last sentence. Add

     | Any such newline representation is considered to be a <dfn>line
     | break character</dfn> in the CSS white space processing rules.
     |
     | CSS does not define how newlines are represented in the source.
     | In the absence of specific document language rules to the contrary,
     | all linefeeds (U+000A), carriage returns (U+000D), and CRLF sequences
     | (U+000D U+000A) in the source text are considered line break
     | characters.
     | In CSS generated content, only line feeds (U+000A) are considered
     | line break characters.

   In the descriptions of 'pre', 'pre-wrap', and 'pre-line':
   Replace
     # Lines are only broken at newlines in the source, or at
     # occurrences of "\A" in generated content.
   with
     | Lines are only broken at `line break characters`_

   In the white space processing rules step 1:
     # Each tab (U+0009), carriage return (U+000D), or space (U+0020)
     # character surrounding a linefeed (U+000A) character is removed
     # if 'white-space' is set to 'normal', 'nowrap', or 'pre-line'.
   remove the mention of carriage returns and replace "linefeed character"
   with "line break character".

   Some possible tweaks to this follow.

   Option A:
     Add the Unicode paragraphs separator (PS, U+2028) and line separator
     (LS, U+2029) to the list of line break characters in generated content
     and in the source text defaults.

   Option B:
     Make generated content match the default source text defaults.

   Option C:
     Add the full list of UAX14 class BK characters to the source text
     defaults and the generated content lists.

   Option D: Option A + Option B

UAX14: http://unicode.org/reports/tr14/

~fantasai

Received on Wednesday, 8 September 2010 07:23:11 UTC