- From: Asmus Freytag <asmusf@ix.netcom.com>
- Date: Mon, 19 Feb 2007 01:27:50 -0800
- To: fantasai <fantasai.lists@inkedblade.net>
- CC: www-style@w3.org, 'WWW International' <www-international@w3.org>
> Every time UAX 14 comes up, some member of the WG notes that taking UAX > 14 literally doesn't work well. Therefore I've been careful to reference > it, but leave that reference non-normative so that implementors can apply > their own judgement to the information it contains. In Unicode 5.0 we've clearly separated those statements in UAX#14 that speak about the characters that one could consider "line break controls", i.e. that were encoded to provide specific interaction with line breaking, from those statements that speak about all other characters, the line break behavior of which results from convention. To enable reliable interchange, the behavior of the control-like characters should be as uniform as possible, therefore we've made their identity and behavior normative in Unicode. The behavior of all other characters is subject to stylistic. orthographic and typographic conventions, which in many cases require explicit tailoring. The case made about Korean having two accepted modes of line breaking is explicitly recognized in the UAX#14 document. I believe that the clear recognition of the fact that a NO BREAK character, for example, NBHY, is encoded only because it allows users to prevent line breaks, and that allowing it to be tailored defeats its purpose, will ultimately help clarify what UAX#14 attempts to do, which is to give precise description of how these special characters are to be treated so that they work as expected, in the context of: providing a baseline implementation for all characters. The design point for the latter is that it should be suitable for mixed language, mixed text scripts and work reasonably well for simple systems (small devices) or simple text solutions on bigger systems. As a result, it adopts the treatment of punctuation based on East Asian line breaking concepts while keeping runs of 'words' and 'numbers' in other scripts together, unless separated by spaces, hyphens and the like. (The support of South East Asian scripts requires additional specifications not provided in UAX#14 - a known limitation). CSS is of course free to support many different modes of line breaking for the regular characters, or even approach this subject differently - because, again, the conventions for the large majority of characters are neither universal, nor unique. (We are of course interested in improving our baseline implementation--if a better default generic baseline behavior exists, I'd like to find out about it - with rules and examples, if possible). However, for the line break controls, CSS should *not* deviate from UAX#14, because doing so, effectively redefines characters that were encoded for their linebreak behavior. This does not mean that we think UAX#14 is infallible: we just found out that our specification of they way that NBHY, NBSP etc.interact with hyphens and soft-hyphens was inadvertantly made too restrictive. The 5.0 formulation is counter to widespread practice and needs for Polish and Portuguese. That is being fixed in 5.0.1. Therefore, instead of silently deviating, the CSS editors should make sure that the normative part of the UAX#14 specification is corrected (if necessary) and then follow it - and discourage any deviation from that normative part by implementations. For the non-normative part, as I already pointed out, we are interested in learning about specific improvements, with the goal to make something like the UAX#14 an attractive baseline implementation in situations where tailoring is either not possible or not feasible. A./
Received on Monday, 19 February 2007 09:28:10 UTC