Re: CSS21 Syndata.html Section 4 from Tex Texin on 2003-10-15 (www-style@w3.org from October 2003)

From: Tex Texin <tex@i18nguy.com>
Date: Wed, 15 Oct 2003 19:47:15 -0400
To: "L. David Baron" <dbaron@dbaron.org>
Cc: www-style@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>
Message-ID: <3F8DDC83.8FE77A0B@i18nguy.com>
Thanks for the quick reply.

"L. David Baron" wrote:

I removed agreed point.

> > 2) Section 4.4 on CSS document representation
> > Mention should be made of the Unicode BOM and its relationship to the encoding
> > of the file.
> 
> Porting the additional text in [2] back to CSS 2.1 could be a start,
> although it might be worth adding some additional text beyond that.
> (For example, a BOM could indicate that a stylesheet is UTF-16, as
> having priority immediately lower than @charset.)

The text would be a good start, although as the questions in the css3 doc
indicate, it needs more.

I am not sure about the BOM precedence. The W3C allows documents to be
transcoded, which invalidates the @charset rule and presumes that http will
provide the new encoding information. I don't know if these transcoders add
BOMs or not.
However, if the transcoded file is then saved, it could be useful to have the
BOM provided by a transcoder override the @charset rule, since files on disk
don't have the benefit of http charset.
I am not really sure if this is a realistic scenario. I am sure others will
comment.

 
> > 5) The section 4.3.7 on strings introduces \A for newline and points to an
> > example, so I assume there isn't a section describing other backslash codes
> > (e.g. \t etc.). However, the section doesn't define what the user should do if
> > they actually want a linefeed.
> >
> > Is \0A (not \A) supposed to generate a linefeed or a newline? In other words,
> > is that string "\A" a special string, or is the character code U+000A mapped to
> > linefeed in css? The parenthetical remark seems to indicate that CSS redefines
> > the Unicode character.
> >
> > (Which seems like a very odd and dangerous thing to do.)
> 
> XML normalizes CR, LF, and CRLF to LF [1], so it seems reasonable to
> treat LF as the new line character within the CSS processing model.  It
> only matters when 'white-space' is 'pre', though.
> 
> Do you have a better alternative in mind?

In the context of parsing text that conforms to a particular grammar, it is a
perfectly reasonable thing to do.

However, the example referenced in "content", has to do with generating text
and the output format generally does not have those grammatical rules, and in
fact may be destined for a media which there is a significant difference
between lf, nl, etc. (For example to a specific printer.)

I was just looking for a way to specify U+000A, if \A is given this behavior.
We just need an alternative to express U+000A, or a way to escape the \A.

However, the questions about whether \a, \0A and \0a are equivalent to \A
should also be addressed.

Unless the WG feels there are a lot of existing stylesheets using \0A to mean
newline, I could see treating \A and \a as special values meaning newline and
\0A...\00000A, and the lower case versions, meaning U+000A.

tex

>
 
> -David
> 
> [1] http://www.w3.org/TR/2000/REC-xml-20001006#sec-line-ends
> [2] http://www.w3.org/TR/2003/WD-css3-syntax-20030813/#css-style
> 
> --
> L. David Baron                                <URL: http://dbaron.org/ >

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Wednesday, 15 October 2003 19:48:21 UTC