W3C home > Mailing lists > Public > www-style@w3.org > January 2011

RE: [css3-text] line break opportunities are based on *syllable* boundaries?

From: CE Whitehead <cewcathar@hotmail.com>
Date: Mon, 31 Jan 2011 02:45:28 -0500
Message-ID: <SNT142-w60B5C006E855E068F05497B3E20@phx.gbl>
To: <mark@macchiato.com>, <fantasai.lists@inkedblade.net>
CC: <ambrose.li@gmail.com>, <cowan@mercury.ccil.org>, <kojiishi@gluesoft.co.jp>, <addison@lab126.com>, <kennyluck@w3.org>, <www-style@w3.org>, <www-international@w3.org>


Date: Sun, 30 Jan 2011 14:58:49 -0800
From: mark@macchiato.com
To: fantasai.lists@inkedblade.net
CC: ambrose.li@gmail.com; cowan@mercury.ccil.org; cewcathar@hotmail.com; kojiishi@gluesoft.co.jp; addison@lab126.com; kennyluck@w3.org; www-style@w3.org; www-international@w3.org
Subject: Re: [css3-text] line break opportunities are based on *syllable* boundaries?

> The wording seems odd for these values, because one would expect word-break:keep-all to not break anywhere. But that option is 
> actually called "word-break:keep-words". Even better would be "word-break:never". (This may be moot; I don't know whether the
> wording of these values has already been cast in stone or not.)

> Even better would be to use more exact terminology, and have the property be named "line-break" and not "word-break", which 
> is different. 
(Hmm,  there is also a separate property line-break to specify the types of line breaks.  Then word-break lets you choose additional ones or not.  Sorry if I make no sense here.)
> But that ship sailed long ago. Here are the word-break boundaries according to Unicode. Note that they include any degenerate cases
> within non-alphabetic sequences: 

> A quick (brown) fox jumped. 
Hmm (I'll have to look at the unicode report before I talk about the word break boundaries here; normally I would not want to break text after an opening parentheses or before a closing however, but that's not why I wrote).   
> And here are the line-break boundaries: 

> A quick (brown) fox jumped. 

> As to the wording in http://lists.w3.org/Archives/Public/www-style/2011Jan/0667.html (and copied below) it needs some work. For example, 
> take just the first line:

>> For most scripts, in the absence of hyphenation a line break occurs only at word boundaries.

> "For most scripts" should be "In most writing systems". And the second part is wrong. Take "A quick (brown) fox jumped."; there is a 
> word break that is not on a word boundary by most people's understanding. "(" is not part of the word "brown", but there is a word 
> break before it. 
Here is my one comment:  a note to Koji:   the only thing I would replace "not words" with is "not word boundaries" (for parallelism with "syllable boundaries").  But it's no big deal.  
I don't have a good definition of word however.  
--C. E. Whitehead

>> For most scripts, in the absence of hyphenation
>> a line break occurs only at word boundaries.
>> Many writing systems use spaces or punctuation
>> to explicitly separate words, and line break
>> opportunities can be identified by these characters.
>> Scripts such as Thai, Lao, and Khmer, however, do
>> not use spaces or punctuation to separate words.
>> Although the zero width space (U+200B) can be used
>> as an explicit word delimiter in these scripts, this
>> practice is not common. As a result, a lexical resource
>> is needed to correctly identify break points in such texts. 
>> In several other writing systems, (including Chinese,
>> Japanese, Yi, and sometimes also Korean) a line break
>> opportunities are based on syllable boundaries, not words.

— Il meglio č l’inimico del bene —

On Sun, Jan 30, 2011 at 14:04, fantasai <fantasai.lists@inkedblade.net> wrote:

On 01/29/2011 02:06 PM, Ambrose LI wrote:

I probably shouldn’t have call them rules, maybe a typographically
desired preference, something you’d want to be able to accomplish if
you mind about details. But if we are talking about the kind of
Western typography that CSS wants to eventually achieve, we might as
well take such things into consideration.


(I expect it to be marked at-risk for CR, though.)


Received on Monday, 31 January 2011 07:48:25 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:07:55 UTC