Re: Comments on section 7.2 of css3-text from Christopher Hoess on 2002-05-29 (www-style@w3.org from May 2002)

From: Christopher Hoess <choess@stwing.upenn.edu>
Date: Wed, 29 May 2002 14:16:30 -0400 (EDT)
To: Ian Hickson <ian@hixie.ch>
Cc: www-style@w3.org
Message-ID: <20020529141552.A19475@force.stwing.upenn.edu>

As requested, my revisions in full, with some minor changes.

Original

White-space processing in the context of CSS is the mechanism by which all 
white-space characters are interpreted for rendering purpose. The 
white-space set is determined by the XML [XML1.0] specification as being a 
combination of one or more space characters (Unicode value U+0020), 
carriage returns (U+000D), line feeds (U+000A), or tabs (U+0009).

Note: [HTML401] also defines the form feed character (U+000C) as a white 
space character, but that character is not part of any XHTML versions as 
they are all based on XML.

The amount of white space processing that can be achieved by a user agent 
that supports CSS is directly related to the CSS processing model, 
especially the document parsing and validation. After parsing and possible 
validation, the document tree may already have been processed in a way 
that white space characters have been collapsed and partially removed 
(white space normalization).

In that respect, the CSS properties related to white space processing can 
only be effective if the CSS processor has access to the whitespace 
characters that were originally encoded in the document. However, 
end-of-line characters are typically handled (like by XML processors) in 
such a way that any arbitrary combination of end-of-line characters is 
replaced by a single line feed character (U+000A).

[I did not revise the next two paragraphs except to replace the phrase 
"white-space processing" with "white-space rendering" when it appeared, so 
I have not reproduced them.]

Revised

White-space procesing in the context of CSS is the mechanism by which all 
white-space characters are interpreted for rendering purposes. White-space 
characters are those which may be used to separate markup without any 
structural effect on the document. Different languages may define 
different characters as white-space.

Note: the white-space set of the XML [XML1.0] specification is a 
combination of one or more space characters (Unicode value U+0020), 
carriage returns (U+000D), line feeds (U+000A), or tabs (U+0009). The 
corresponding set of characters in SGML consists of the SPACE, RE, RS, 
and SEPCHAR function characters in the SGML declaration.

Definitions of white-space usually encompass end-of-line characters, which 
are generally represented as carriage returns (U+000D), line feeds 
(U+000A), or carriage-return--line-feed pairs. While this document treats 
end-of-line characters as if they were a single line feed (as is the case 
in parsed XML documents), the operation of these properties should be 
independent of the representation of end-of-line characters.

Note: Languages such as XML and SGML mandate the removal of certain 
white-space characters through processes such as record handling and 
attribute value normalization. This whitespace is outside the scope of 
these properties, as it is eliminated during the parsing that creates the 
document tree.

[As said above, also replace the two occurrences of "white-space 
processing" with "white-space rendering".]

-- 
Chris Hoess
Mozilla QA Flunky

Received on Wednesday, 29 May 2002 15:15:19 UTC