- From: Coises <Randy@Coises.com>
- Date: Tue, 27 Aug 2002 02:21:22 -0700
- To: www-style@w3.org
[The "\A" in question below appears in the value of the content property.] [Mon, 26 Aug 2002 18:56:50 +0200] Bert Bos: >No, the 'white-space' property has no effect on '\A', since the '\A' >is not inserted into the *input* of the CSS renderer, but into the >*output*. Whitespace in the input is a form of mark-up and is thus >interpreted by the HTML (or XML) parser and further undergoes >transformations by the CSS renderer. But the '\A' is simply part of >the rendered output. You can regard it as a glyph or as a control >code, but the term "whitespace" doesn't apply to it. Are spaces (" " and/or "\20") in the content property "whitespace"? Referring to: http://www.w3.org/TR/REC-CSS2/intro.html#processing-model I gather the "input" is the document tree (created by the document language parser in step 1) and the "output" is the formatting structure (delivered to the user agent's rendering engine in step 6). The document language parser must recognize whitespace in order to read its input and build the document tree; however, at this stage it must still preserve the individual whitespace characters used in element content --- since the "white-space" property hasn't yet been determined, it is unknown if and how this whitespace will be transformed. Does the document tree at this point (in the conceptual model) contain some sort of "tokenized" version of each element's content, so that the original parsing of whitespace is available to later stages? or is that original parsing thrown away within elements? That is, does input like this: <P>This is just a short paragraph. </P> get passed along as something like this: * element: P content: text("This"), wsp(" "), text("is"), wsp("\00000A\000009"), text("just"), wsp(" "), text("a"), wsp(" "), text("short"), wsp(" "), text("paragraph.") that's already parsed for whitespace? or just like this: * element: P content: "This is\00000A\000009just a short paragraph." with the content represented as a pure Unicode string (with "early" processing such as removing line breaks immediately after opening and before closing tags and resolving entity references already applied)? Note that it must be one or the other: either we pass along something more complicated than a Unicode string to represent the content of each element, or we retain whitespace characters in content verbatim (losing "parsing" of this whitespace that occurred before creating the document tree). For the formatting structure, CSS could supply "tokenized" content to the renderer, or it could supply Unicode strings. The renderer would have to perform word-wrapping when appropriate; but the collapse of whitespace to a single blank according to the white-space property could (in some models) be done before the formatting structure is delivered to the renderer. So, we have four possible cases: * document tree and formatting structure are both Unicode * document tree and formatting structure are both tokenized * document tree is Unicode, formatting structure is tokenized * document tree is tokenized, formatting structure is Unicode * Document tree and formatting structure are both Unicode In this case, it is difficult to see how generated content could be treated any differently than document content, unless the CSS processor arbitrarily tags all :before and :after pseudo-elements with "white-space: pre" --- which begs the question, "Why not honor the white-space property?" * Document tree and formatting structure are both tokenized In this case, it makes sense that no generated content would contain "whitespace" of any kind, since the CSS processor would just be passing document language whitespace unchanged, and would need to do no whitespace processing itself. There are some oddities in this model: it would imply that word wrapping cannot occur within generated content (since there can be no "whitespace" in it); and *probably* --- depending on how the rendering engine works --- spaces in generated content would not expand when "text-align: justify" is in effect, since the rendering engine would not see them as whitespace. I also have to wonder whether any practical implementation would actually follow this model. * Document tree is Unicode, formatting structure is tokenized In this case, CSS (not the document language) would define "whitespace" in document content: is this in fact how it works? Since the CSS processor would be managing whitespace in this model, it is unclear why generated content should not be subject to the white-space property. * Document tree is tokenized, formatting structure is Unicode In this case, the CSS processor would presumably be condensing whitespace (assuming the rendering engine doesn't re-parse whitespace; otherwise there would be little sense in using this instead of the Unicode/Unicode model). The rendering engine would recognize spaces as whitespace, and otherwise need only to know whether or not word wrapping is in effect. Using this model, we would expect that in generated content, "\A" would always be a newline, multiple blanks would not be condensed, and blanks would be recognized as whitespace for purposes of justification and line wrapping by the rendering engine. The document language would define "whitespace" within the document itself, but only blanks would have the effect of whitespace in generated content (and would not be condensed). I don't get the sense that this is how current browsers actually work; but it sounds like what Bert and the CSS 2 specification may have intended. The "bottom line" here is that I think a bit more clarification is needed. -- Randall Joseph Fellmy aka Randy@Coises.com
Received on Tuesday, 27 August 2002 05:21:53 UTC