Re: CSS3 Text and UAX14

Asmus Freytag wrote:
> 
> On 2/20/2007 3:55 AM, Paul Nelson (ATC) wrote:
>> 
>> The only place where I see problems with the SP definition are in the 
>> PRE situation where we are keeping the widths of all spaces 
>> explicitly. In this case are we really tailoring the line breaking 
>> class of the character?   
>
> I think PRE is not an issue. The only time you get an issue is when you 
> use "CRT-style" line breaking at a fixed column so that SP can get 
> wrapped to the head of a new line. CRT-style line breaking is clearly 
> not UAX-14 compliant, and should therefore be labeled as such (an 
> non-UAX14 compliant mode with special behavior)

Well, SP can get wrapped to the head of a new line in the following
case as well:

   <p>Some <code> </code> <code> </code> text with spaces.</p>

where
   code { white-space: pre; }

In this case the spaces in <code> can get wrapped to the front of
the line.

Also, I think UAX14 should allow wrapping within space sequences
generally: it can be useful in some editing contexts to wrap spaces
that don't fit so that the author is aware of them and can delete
excess spaces if necessary. If they disappear off the end of the
line, then it's hard to notice that they're there.

> The descriptions in UAX#14 of how the *width* of SPACE characters is 
> handled are informative material on line-layout, not specifications of 
> line-breaking.

Yes, but, as I said before, there is no clear distinction between
normative requirements and informative statements in UAX14. This
makes me very hesitant to include anything more than normative
references to specific parts of UAX 14 that I can be sure won't
cause problems.

Techniques that would help:
* Don't use RFC2119 terms for non-normative statements, and when you
   use them, keep usage consistent with RFC2119.
     Examples that use RFC2119 terms, but not to set normative
     requirements or allowances:
       "URLs are now so common in regular plain text that they must
       be taken into account when assigning general-purpose line
       breaking properties."
       "This is the preferred character to use where words must be
       hyphenated but may not be broken at the hyphen."
       "This may require additional tailorings beyond those considered
       in this section."
* Use either RFC2119 terms or descriptive assertions for normative
   statements.
     Examples of normative statements using RFC2119 terms:
       "The closing character of any set of paired punctuation must
       be kept with the preceding character, and the same applies
       to all forms of wide comma and full stop."
       "For modern text processing these should be treated as line
       break opportunities by default."
     Example of normative descriptive assertion in UAX14:
       "In bidirectional text, line breaks takes are determined before
       applying rule L1 of the Unicode Bidirectional Algorithm [Bidi].
       However, line breaking is strictly independent of directional
       properties of the characters or of any auxiliary information
       determined by the application of rules of that algorithm."
* Don't use sentences that sound like normative descriptive assertions
   for informative statements.
     Examples of self-evidently informative statements:
       "In table headings that use Han ideographs, even extreme amounts
       of intercharacter space commonly occur as short texts are spread
       out across the entire available space to distribute the characters
       evenly from end to end."
       "Many currency signs can appear on both sides, or even the middle,
       of a numeric expression."
     Example of an informative statement(?) that sounds normative:
       "spaces at the end of a line are not measured for fit"


Another problem I've noticed: SP is specified as not tailorable, but it
is left out of the list of non-tailorable character classes in the list
at the top of 6.1. What is the intent of the spec? Can membership in SP
be tailored or not?

~fantasai

Received on Wednesday, 21 February 2007 01:12:14 UTC