W3C home > Mailing lists > Public > public-i18n-core@w3.org > April to June 2011

Re: [css3-text] New Working Draft

From: fantasai <fantasai.lists@inkedblade.net>
Date: Wed, 20 Apr 2011 20:31:52 -0700
Message-ID: <4DAFA528.2060707@inkedblade.net>
To: verdy_p@wanadoo.fr
CC: CE Whitehead <cewcathar@hotmail.com>, public-i18n-core@w3.org, public-i18n-indic@w3.org, public-i18n-cjk@w3.org, unicode@unicode.org
On 04/20/2011 07:58 PM, Philippe Verdy wrote:
 > I disagree, because it breaks the inherent nature of the script. Joins
 > in Arabic are mandatory, and create "super grapheme clusters".

Joins in Arabic are mandatory, and they are also broken across lines
for hyphenation.

 > When you say that  it does not consider morphemic, syllabic, or other
 > boundaries , this is already wrong because it already considers the
 > default grapheme cluster boundaries. Note that the default grapheme
 > boundaries were designed only to be locale neutral. But here we are
 > speaking about localization where the language and its script will
 > matter, including in its fundamental properties. Joining types in
 > Arabic are key parts of the script.

Which is why the joining behavior is preserved even though it is broken
across lines.

 > But in the previous part of the specification, nothing speaks about
 > them, and all what is left on the upper levels where trying to find
 > language-correct boundaries will fail. After this level, there shoudl
 > still be a level related to the script itself (independantly of the
 > language), before trying the last-chance "emergency" breaks. This
 > intermediate level can still be prioritized, just as it was in the
 > previous steps.

CSS does not prohibit such steps, but I do not think it should
prescribe them in this case. That's not what this feature is for.

 > And yes, even in that case you could still insert the hyphenation
 > symbol to show that the word was effectively broken (it is common
 > practice to insert it, even in the Latin script and even if this is
 > not the preferred syllabic or morphemic break position, which can only
 > be infered by language specific rules and a lookup dictionnary for
 > handling many exception cases).

"word-break: break-word" does not insert hyphens. Hyphenation is a
different feature.

 > The hyphenation symbol is generally very narrow, and if needed, it
 > cans still overflow a bit in the margin.

Note that overflowing even "a bit" still produces scrollbars.

 > The choice of the hyphenation symbol is also a property of the script.
 > In many East and South-East Asian scripts, there's not even any symbol
 > for that, because break can occur between all grapheme clusters.

If you've got a pointer to resources indicating the correct hyphenation
symbol for various scripts or languages, I'd be interested in linking
that from the hyphenation section. :)

 > Note: in Indic scripts, the danda or double-danda punctuations should
 > be treated like the commas and stops in your spec and preferably not
 > left alone on the next line, even if it falls within the margin (you
 > showed cases for East-Asian scripts only : Han, Hiragana, Katakana,
 > Hangul, Bopomofo, Yi, Mongolian...)

Are you talking about the rules for 'hanging-punctuation' or 'line-break'
or something else?

~fantasai
Received on Thursday, 21 April 2011 03:36:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 21 April 2011 03:36:09 GMT