- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 8 Sep 2010 15:10:25 +0300
- To: fantasai <fantasai.lists@inkedblade.net>
- Cc: www-style@w3.org
On Sep 8, 2010, at 10:22, fantasai wrote: > # Newlines in the source can be represented by a carriage > # return (U+000D), a linefeed (U+000A) or both (U+000D U+000A), > # or by some other mechanism that identifies the beginning > # and end of document segments, such as the SGML RECORD-START > # and RECORD-END tokens. The CSS 'white-space' processing > # model assumes all newlines have been normalized to line feeds. > > Drop the last sentence. Add > > | Any such newline representation is considered to be a <dfn>line > | break character</dfn> in the CSS white space processing rules. > | > | CSS does not define how newlines are represented in the source. > | In the absence of specific document language rules to the contrary, > | all linefeeds (U+000A), carriage returns (U+000D), and CRLF sequences > | (U+000D U+000A) in the source text are considered line break > | characters. Is 'source text' a special term in CSS that means text in the document tree? I think 'source text' is confusing, because the whole point of my concern is that a carriage return doesn't appear literally in HTML (or XML) source but does appear in the resulting DOM. Can this be changed to talk about text in the document tree? > Option A: > Add the Unicode paragraphs separator (PS, U+2028) and line separator > (LS, U+2029) to the list of line break characters in generated content > and in the source text defaults. > Option C: > Add the full list of UAX14 class BK characters to the source text > defaults and the generated content lists. These options look rather XML 1.1-ish to me. To address my site compat concern I presented, it is unnecessary to go beyond CR. Furthermore, I think adding non-ASCII characters to white space operations shouldn't be done lightly, since performance and implementation complexity issues may arise when the internal representation of text is UTF-8. (I didn't check if there are astral BK characters that'd cause problems even when the internal representation is UTF-16.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Wednesday, 8 September 2010 12:11:01 UTC