W3C home > Mailing lists > Public > www-international@w3.org > January to March 2007

Re: CSS3 Text and UAX14

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Tue, 20 Feb 2007 14:01:29 -0800
Message-ID: <45DB6FB9.8080601@ix.netcom.com>
To: fantasai <fantasai.lists@inkedblade.net>
CC: www-style@w3.org, 'WWW International' <www-international@w3.org>

There have been quite a few messages on this thread. I'll respond in order.

A./

Comments below:

On 2/20/2007 2:22 AM, fantasai wrote:
> Asmus Freytag wrote:
>> fantasai wrote:
>>> Every time UAX 14 comes up, some member of the WG notes that taking UAX
>>> 14 literally doesn't work well. Therefore I've been careful to 
>>> reference
>>> it, but leave that reference non-normative so that implementors can 
>>> apply
>>> their own judgement to the information it contains. 
> >
>> In Unicode 5.0 we've clearly separated those statements in UAX#14 
>> that speak about the characters that one could consider "line break 
>> controls", i.e. that were encoded to provide specific interaction 
>> with line breaking, from those statements that speak about all other 
>> characters, the line break behavior of which results from convention.
>>
>> To enable reliable interchange, the behavior of the control-like 
>> characters should be as uniform as possible, therefore we've made 
>> their identity and behavior normative in Unicode. The behavior of all 
>> other characters is subject to stylistic. orthographic and 
>> typographic conventions, which in many cases require explicit tailoring. 
> [...]
>> However, for the line break controls, CSS should *not* deviate from 
>> UAX#14, because doing so, effectively redefines characters that were 
>> encoded for their linebreak behavior. This does not mean that we 
>> think UAX#14 is infallible: we just found out that our specification 
>> of they way that NBHY, NBSP etc.interact with hyphens and 
>> soft-hyphens was inadvertantly made too restrictive. The 5.0 
>> formulation is counter to widespread practice and needs for Polish 
>> and Portuguese. That is being fixed in 5.0.1. Therefore, instead of 
>> silently deviating, the CSS editors should make sure that the 
>> normative part of the UAX#14 specification is corrected (if 
>> necessary) and then follow it - and discourage any deviation from 
>> that normative part by implementations.
>
> Your argument has convinced me that CSS3 Text should be normatively
> requiring the correct implementation of UAX14's normative line breaking
> classes.
Good.
> However, there are several problems with a full normative
> reference to UAX 14. Here are those I've found so far:
>
>   1. Spaces are a non-tailorable line breaking class. The description
>      of its behavior also includes prescriptions on presentation that
>      are not compatible with what CSS prescribes.
First, to make sure we are talking about the same thing here:
In UAX#14, the class SP encompasses *only* the ASCII SPACE. We've not found
evidence that any other space-like characters (such as the EM SPACE) need to
get the same special treatment in *line breaking* as U+0020 SPACE.

Three nontailorable rules mention SP. Rule 7 forbids breaking before a 
SPACE,
rule 9 states that combining marks following a SPACE are not treated 
like a SPACE
themselves, and rule 12 has one term mentioning SPACE. That term is 
being split
off into a new rule 12b and will then be tailorable.

Any other description of line *layout* behavior in UAX#14 is 
informative, and
provided mainly for background. This includes mention of how spaces are 
handled
in line fitting (other than in determining line break opportunities).

With that, admittedly lengthy preamble, I would like to find out where 
you see
specific differences that need to be addressed. An example or two should 
clarify
the situation.

One more thing: in markup languages U+0020 is used as a syntax 
character, but one
that also can represent a space in the data. It would be useful to 
clarify that the issues
around white space handling in markup languages are permissible higher 
level protocol
specifications, because UAX#14 applies to the logical text stream (my 
term here,
if there's a correct term for that in XML/CSS, please let me know).

>   2. CSS has a line breaking mode that forbids all breaks. This needs to
>      override the non-tailorable behavior of the ZW (and SP?) classes.
The simplest thing is to view that as a mode that is fundamentally not 
compliant with
UAX#14. Unicode has no need to build such modes into its default 
algorithm, since
their description is independent of Unicode character properties. Rather 
than complicated
work-arounds inside our specification, the correct answer is for 
protocols to have a
UAX#14 compliant mode or modes (the latter if specific tailorings are 
supported)
and one or more non-standard modes.
>   3. CSS3 Text introduces an 'unrestricted' line breaking mode. In this
>      mode, line breaking restrictions are ignored completely, (except for
>      the CM class). I don't see any allowance for this kind of behavior
>      (effectively suspending all line-breaking restrictions, including
>      WJ and GL) in UAX 14.
Again, suspending UAX#14 conformance is what you are doing here. The 
fact that
you still honor CM makes this mode a bit dependent on character 
properties, but
it's better handled outside the UAX algorithm. Also, you may want to 
consider
whether you truly mean CM here (as specified in UAX#14) or whether you only
want NSM (non-spacing marks, that don't increase the line width).
>
> In general, UAX 14 is not worded in a manner that makes it clear which
> statements are normative requirements and which are background 
> information.
> For example,
>   "Finally, most text layout systems will support an emergency mode that
>    handles the case of an unusual line that contains no ordinary line
>    break opportunities."
> This is worded as if it's background information. What it does is provide
> some information about (presumably non-UAX14-compliant) layout systems in
> general. What was probably meant was to allow UAX14-compliant layout
> systems to implement this behavior.
What was meant was to set up the idea that it's not the system that is 
yes/no compliant
but its normal line breaking mode - allowing the same system and 
protocol to also
support other modes that are not UAX-14 compliant. This is different 
from Bidi,
as for Bidi we *require* that Unicode compliant implementation use the bidi
algorithm. Unicode compliance doesn't require that any implementation 
support
UAX-14, but if they do, they must be compliant.
>
> > For the non-normative part, as I already pointed out, we are interested
> > in learning about specific improvements, with the goal to make 
> something
> > like the UAX#14 an attractive baseline implementation in situations
> > where tailoring is either not possible or not feasible.
>
> I don't, at this point, have any specific points to add that have not
> already been brought up. However, I'll report back any problems or
> improvements pointed out by members of the CSS WG at our next F2F.
Great. I think the business about whether a system or just one of its modes
needs to be compliant is something that we can perhaps clarify more; the
same for the distinction between the role of SPACE in markup source
and marked up content.

I'll take these as advance feedback on the proposed update for version
5.0.1 of UAX#14 that we are about to release.

A./
>
> ~fantasai
>
>
Received on Tuesday, 20 February 2007 22:01:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:09 GMT