W3C home > Mailing lists > Public > www-style@w3.org > December 2002

Re: WD-css3-text-20021024 substantive comments

From: Etan Wexler <ewexler@stickdog.com>
Date: Wed, 25 Dec 2002 04:33:32 -0500
Message-ID: <1129858518.20021225043332@stickdog.com>
To: www-style@w3.org, Ian Hickson <ian@hixie.ch>

Ian Hickson wrote to <www-style@w3.org> on 23 December 2002 in
"WD-css3-text-20021024 substantive comments"
(<mid:Pine.LNX.4.21.0212162038130.17087-100000@dhalsim.dreamhost.com>):  

> Both XML and SGML normalise newlines to single U+000A characters.

Your statement is incorrect regarding SGML, defined by ISO 8879.  (My
"SGML Handbook" is across the country at the moment, so I will not be
able to cite clauses from ISO 8879.  The following comes from memory.)
In general, an SGML system will, after parsing, use a carriage return
character (U+000D) to represent a line break from the input stream
that constitutes an SGML document.

I elaborate for those interested.  In SGML, each line of input text is
called a record, to distinguish it from lines of output (for example,
a formatted line on a printed page).  Records come with delimiting
characters: a record start character noted as RS, and a record end
character noted as RE.  In SGML, RS and RE are delimiter roles that,
in the concrete syntax used by a document, need to be assigned
codepoints.  The Reference Concrete Syntax, a recommended part of ISO
8879 that is in wide use in SGML documents and systems, reflects a
common convention and assigns U+000A to RS and U+000D to RE.  SGML
parsing rules typically discard RS characters and retain RE characters
(there are a few minor exceptions, most of which I cannot recall).
This behavior, working with the Reference Concrete Syntax, leaves us
with carriage return (U+000D) for line breaks.

Then again, somebody could write and use a concrete syntax that
assigns U+000D to RS and U+000A to RE.  For that matter, somebody
could assign U+231B (hourglass) to RS and U+0C6C (Telugu digit six) to
RE.  That is the beauty (and terror) of SGML: it is a framework most
versatile, and nothing in ISO 8879 prohibits users from breaking
convention or dashing expectations.

> So in CSS, [U+000A (line feed) is] the newline character.

The point, though, is that CSS could be used with any ordered
hierarchy of content objects, if we did not overly confine CSS
processors.  XML documents are prime examples of ordered hierarchies
of content objects, but not the only ones.  Whether one chooses XML,
SGML, or some other framework, CSS should be able to do a good job.
One requirement of a good job is using the line breaks native to the
framework.

-- 
Etan Wexler <mailto:ewexler@stickdog.com>
Every time you touch me I feel like I'm being bored.
Received on Wednesday, 25 December 2002 05:35:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 27 April 2009 13:54:18 GMT