Re: [css3-text] line break opportunities are based on *syllable* boundaries? from Mark Davis ☕ on 2011-01-30 (www-international@w3.org from January to March 2011)

From: Mark Davis ☕ <mark@macchiato.com>
Date: Sun, 30 Jan 2011 14:58:49 -0800
To: fantasai <fantasai.lists@inkedblade.net>
Cc: Ambrose LI <ambrose.li@gmail.com>, John Cowan <cowan@mercury.ccil.org>, CE Whitehead <cewcathar@hotmail.com>, kojiishi@gluesoft.co.jp, addison@lab126.com, kennyluck@w3.org, www-style@w3.org, www-international@w3.org
Message-ID: <AANLkTinhMbR4Bk+JB6J4i_LpxYVsjqT5J9L7m2-MRFCj@mail.gmail.com>

The wording seems odd for these values, because one would expect
word-break:keep-all to not break anywhere. But that option is actually
called "word-break:keep-words". Even better would be "word-break:never". (This
may be moot; I don't know whether the wording of these values has already
been cast in stone or not.)

Even better would be to use more exact terminology, and have the property be
named "line-break" and not "word-break", which is different. But that ship
sailed long ago. Here are the word-break boundaries according to Unicode.
Note that they include any degenerate cases within non-alphabetic
sequences:

A quick (brown) fox jumped.

And here are the line-break boundaries:

A quick (brown) fox jumped.

As to the wording in
http://lists.w3.org/Archives/Public/www-style/2011Jan/0667.html (and copied
below) it needs some work. For example, take just the first line:

> For most scripts, in the absence of hyphenation a line break occurs only
at word boundaries.

"For most scripts" should be "In most writing systems". And the second part
is wrong. Take "A quick (brown) fox jumped."; there is a word break that is
not on a word boundary by most people's understanding. "(" is not part of
the word "brown", but there is a word break before it.

> For most scripts, in the absence of hyphenation
> a line break occurs only at word boundaries.
> Many writing systems use spaces or punctuation
> to explicitly separate words, and line break
> opportunities can be identified by these characters.
> Scripts such as Thai, Lao, and Khmer, however, do
> not use spaces or punctuation to separate words.
> Although the zero width space (U+200B) can be used
> as an explicit word delimiter in these scripts, this
> practice is not common. As a result, a lexical resource
> is needed to correctly identify break points in such texts.
>
> In several other writing systems, (including Chinese,
> Japanese, Yi, and sometimes also Korean) a line break
> opportunities are based on syllable boundaries, not words.

Mark

*— Il meglio è l’inimico del bene —*

On Sun, Jan 30, 2011 at 14:04, fantasai <fantasai.lists@inkedblade.net>wrote:

> On 01/29/2011 02:06 PM, Ambrose LI wrote:
>
>> I probably shouldn’t have call them rules, maybe a typographically
>> desired preference, something you’d want to be able to accomplish if
>> you mind about details. But if we are talking about the kind of
>> Western typography that CSS wants to eventually achieve, we might as
>> well take such things into consideration.
>>
>
> Done.
>  http://dev.w3.org/csswg/css3-text/#word-break
>
> (I expect it to be marked at-risk for CR, though.)
>
> ~fantasai
>
>

Received on Sunday, 30 January 2011 23:00:35 UTC