Re: CSS3 Text - Edit suggestions from Asmus Freytag on 2007-02-19 (www-style@w3.org from February 2007)

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Mon, 19 Feb 2007 01:27:50 -0800
To: fantasai <fantasai.lists@inkedblade.net>
CC: www-style@w3.org, 'WWW International' <www-international@w3.org>
Message-ID: <45D96D96.9000409@ix.netcom.com>
> Every time UAX 14 comes up, some member of the WG notes that taking UAX
> 14 literally doesn't work well. Therefore I've been careful to reference
> it, but leave that reference non-normative so that implementors can apply
> their own judgement to the information it contains. 
In Unicode 5.0 we've clearly separated those statements in UAX#14 that 
speak about the characters that one could consider "line break 
controls", i.e. that were encoded to provide specific interaction with 
line breaking, from those statements that speak about all other 
characters, the line break behavior of which results from convention.

To enable reliable interchange, the behavior of the control-like 
characters should be as uniform as possible, therefore we've made their 
identity and behavior normative in Unicode. The behavior of all other 
characters is subject to stylistic. orthographic and typographic 
conventions, which in many cases require explicit tailoring. The case 
made about Korean having two accepted modes of line breaking is 
explicitly recognized in the UAX#14 document.

I believe that the clear recognition of the fact that a NO BREAK 
character, for example, NBHY, is encoded only because it allows users to 
prevent line breaks, and that allowing it to be tailored defeats its 
purpose, will ultimately help clarify what UAX#14 attempts to do, which 
is to give precise description of how these special characters are to be 
treated so that they work as expected, in the context of: providing a 
baseline implementation for all characters.
The design point for the latter is that it should be suitable for mixed 
language, mixed text scripts and work reasonably well for simple systems 
(small devices) or simple text solutions on bigger systems. As a result, 
it adopts the treatment of punctuation based on East Asian line breaking 
concepts while keeping runs of  'words' and 'numbers' in other scripts 
together, unless separated by spaces, hyphens and the like. (The support 
of South East Asian scripts requires additional specifications not 
provided in UAX#14 - a known limitation).

CSS is of course free to support many different modes of line breaking 
for the regular characters, or even approach this subject differently - 
because, again, the conventions for the large majority of characters are 
neither universal, nor unique. (We are of course interested in improving 
our baseline implementation--if a better default generic baseline 
behavior exists, I'd like to find out about it - with rules and 
examples, if possible).

However, for the line break controls, CSS should *not* deviate from 
UAX#14, because doing so, effectively redefines characters that were 
encoded for their linebreak behavior. This does not mean that we think 
UAX#14 is infallible: we just found out that our specification of they 
way that NBHY, NBSP etc.interact with hyphens and soft-hyphens was 
inadvertantly made too restrictive. The 5.0 formulation is counter to 
widespread practice and needs for Polish and Portuguese. That is being 
fixed in 5.0.1. Therefore, instead of silently deviating, the CSS 
editors should make sure that the normative part of the UAX#14 
specification is corrected (if necessary) and then follow it - and 
discourage any deviation from that normative part by implementations.

For the non-normative part, as I already pointed out, we are interested 
in learning about specific improvements, with the goal to make something 
like the UAX#14 an attractive baseline implementation in situations 
where tailoring is either not possible or not feasible.

A./
Received on Monday, 19 February 2007 09:28:10 UTC