- From: fantasai <fantasai.lists@inkedblade.net>
- Date: Wed, 08 Sep 2010 00:22:35 -0700
- To: Henri Sivonen <hsivonen@iki.fi>
- CC: www-style@w3.org
On 08/03/2010 06:51 AM, Henri Sivonen wrote:
> On Aug 2, 2010, at 23:23, fantasai wrote:
>
>> "treated as whitespace" is vague. Different kinds of whitespace are treated
>> differently.
>
> My exact requirements are vague. I know that I need CR to behave in a
> whitespace-ish way, but I'm not confident about the details.
> ...
>>> For white-space-collapse: preserve-breaks;, I'm not totally confident what's
>>> best, but I've been persuaded that Opera 10.60's behavior (CR is a break but
>>> it coalesces with LF when appearing in a CRLF pair) is the thing I should
>>> be wanting.
>>
>> So you want CRLF normalization to happen at the CSS level in addition to the
>> source markup level for text appearing in the DOM,
>
> Yes.
>
>> but not for text in generated content?
>
> I didn't intend to express an opinion about generated content. I don't know
> how exactly code reuse works in implementation between DOM-appearing content
> and generated content, but I don't want to introduce any difficulties in
> that area.
Below are two different ways of treating CR as a line breaking
character in the white space processing rules. This is Take I.
Approach 1: Normative normalization to "line feed character".
In this paragraph:
# Newlines in the source can be represented by a carriage
# return (U+000D), a linefeed (U+000A) or both (U+000D U+000A),
# or by some other mechanism that identifies the beginning
# and end of document segments, such as the SGML RECORD-START
# and RECORD-END tokens. The CSS 'white-space' processing
# model assumes all newlines have been normalized to line feeds.
Append to the last sentence
| UAs that recognize other newline representations must apply
| the white space processing rules as if this normalization
| has taken place.
|
| In the absence of specific document language rules to the contrary,
| each carriage return (U+000D) and CRLF sequence (U+000D U+000A) in the
| source text is treated as single line feed character.
In the white space processing rules step 1:
# Each tab (U+0009), carriage return (U+000D), or space (U+0020)
# character surrounding a linefeed (U+000A) character is removed
# if 'white-space' is set to 'normal', 'nowrap', or 'pre-line'.
remove the mention of carriage returns.
Approach 2: Generalizing to "line break character".
In this paragraph:
# Newlines in the source can be represented by a carriage
# return (U+000D), a linefeed (U+000A) or both (U+000D U+000A),
# or by some other mechanism that identifies the beginning
# and end of document segments, such as the SGML RECORD-START
# and RECORD-END tokens. The CSS 'white-space' processing
# model assumes all newlines have been normalized to line feeds.
Drop the last sentence. Add
| Any such newline representation is considered to be a <dfn>line
| break character</dfn> in the CSS white space processing rules.
|
| CSS does not define how newlines are represented in the source.
| In the absence of specific document language rules to the contrary,
| all linefeeds (U+000A), carriage returns (U+000D), and CRLF sequences
| (U+000D U+000A) in the source text are considered line break
| characters.
| In CSS generated content, only line feeds (U+000A) are considered
| line break characters.
In the descriptions of 'pre', 'pre-wrap', and 'pre-line':
Replace
# Lines are only broken at newlines in the source, or at
# occurrences of "\A" in generated content.
with
| Lines are only broken at `line break characters`_
In the white space processing rules step 1:
# Each tab (U+0009), carriage return (U+000D), or space (U+0020)
# character surrounding a linefeed (U+000A) character is removed
# if 'white-space' is set to 'normal', 'nowrap', or 'pre-line'.
remove the mention of carriage returns and replace "linefeed character"
with "line break character".
Some possible tweaks to this follow.
Option A:
Add the Unicode paragraphs separator (PS, U+2028) and line separator
(LS, U+2029) to the list of line break characters in generated content
and in the source text defaults.
Option B:
Make generated content match the default source text defaults.
Option C:
Add the full list of UAX14 class BK characters to the source text
defaults and the generated content lists.
Option D: Option A + Option B
UAX14: http://unicode.org/reports/tr14/
~fantasai
Received on Wednesday, 8 September 2010 07:23:11 UTC