- From: fantasai <fantasai.lists@inkedblade.net>
- Date: Wed, 08 Sep 2010 00:22:35 -0700
- To: Henri Sivonen <hsivonen@iki.fi>
- CC: www-style@w3.org
On 08/03/2010 06:51 AM, Henri Sivonen wrote: > On Aug 2, 2010, at 23:23, fantasai wrote: > >> "treated as whitespace" is vague. Different kinds of whitespace are treated >> differently. > > My exact requirements are vague. I know that I need CR to behave in a > whitespace-ish way, but I'm not confident about the details. > ... >>> For white-space-collapse: preserve-breaks;, I'm not totally confident what's >>> best, but I've been persuaded that Opera 10.60's behavior (CR is a break but >>> it coalesces with LF when appearing in a CRLF pair) is the thing I should >>> be wanting. >> >> So you want CRLF normalization to happen at the CSS level in addition to the >> source markup level for text appearing in the DOM, > > Yes. > >> but not for text in generated content? > > I didn't intend to express an opinion about generated content. I don't know > how exactly code reuse works in implementation between DOM-appearing content > and generated content, but I don't want to introduce any difficulties in > that area. Below are two different ways of treating CR as a line breaking character in the white space processing rules. This is Take I. Approach 1: Normative normalization to "line feed character". In this paragraph: # Newlines in the source can be represented by a carriage # return (U+000D), a linefeed (U+000A) or both (U+000D U+000A), # or by some other mechanism that identifies the beginning # and end of document segments, such as the SGML RECORD-START # and RECORD-END tokens. The CSS 'white-space' processing # model assumes all newlines have been normalized to line feeds. Append to the last sentence | UAs that recognize other newline representations must apply | the white space processing rules as if this normalization | has taken place. | | In the absence of specific document language rules to the contrary, | each carriage return (U+000D) and CRLF sequence (U+000D U+000A) in the | source text is treated as single line feed character. In the white space processing rules step 1: # Each tab (U+0009), carriage return (U+000D), or space (U+0020) # character surrounding a linefeed (U+000A) character is removed # if 'white-space' is set to 'normal', 'nowrap', or 'pre-line'. remove the mention of carriage returns. Approach 2: Generalizing to "line break character". In this paragraph: # Newlines in the source can be represented by a carriage # return (U+000D), a linefeed (U+000A) or both (U+000D U+000A), # or by some other mechanism that identifies the beginning # and end of document segments, such as the SGML RECORD-START # and RECORD-END tokens. The CSS 'white-space' processing # model assumes all newlines have been normalized to line feeds. Drop the last sentence. Add | Any such newline representation is considered to be a <dfn>line | break character</dfn> in the CSS white space processing rules. | | CSS does not define how newlines are represented in the source. | In the absence of specific document language rules to the contrary, | all linefeeds (U+000A), carriage returns (U+000D), and CRLF sequences | (U+000D U+000A) in the source text are considered line break | characters. | In CSS generated content, only line feeds (U+000A) are considered | line break characters. In the descriptions of 'pre', 'pre-wrap', and 'pre-line': Replace # Lines are only broken at newlines in the source, or at # occurrences of "\A" in generated content. with | Lines are only broken at `line break characters`_ In the white space processing rules step 1: # Each tab (U+0009), carriage return (U+000D), or space (U+0020) # character surrounding a linefeed (U+000A) character is removed # if 'white-space' is set to 'normal', 'nowrap', or 'pre-line'. remove the mention of carriage returns and replace "linefeed character" with "line break character". Some possible tweaks to this follow. Option A: Add the Unicode paragraphs separator (PS, U+2028) and line separator (LS, U+2029) to the list of line break characters in generated content and in the source text defaults. Option B: Make generated content match the default source text defaults. Option C: Add the full list of UAX14 class BK characters to the source text defaults and the generated content lists. Option D: Option A + Option B UAX14: http://unicode.org/reports/tr14/ ~fantasai
Received on Wednesday, 8 September 2010 07:23:11 UTC