W3C home > Mailing lists > Public > www-html@w3.org > March 2004

Re: <NOBR> - Returning to the question....

From: L. David Baron <dbaron@dbaron.org>
Date: Tue, 30 Mar 2004 10:43:45 -0800
To: www-html@w3.org
Message-ID: <20040330184345.GA5015@darby.dbaron.org>

On Tuesday 2004-03-30 20:53 +0300, Jukka K. Korpela wrote:
> On Tue, 30 Mar 2004 olafBuddenhagen@web.de wrote:
> > If Unicode linebreaking rules are any good (I do not know them),

> > the
> > problem is actually a different one: Nobody but professional typesetters
> > do know and respect the five or so different types of dash-like
> > characters, all fulfilling a different purpose, and all having a
> > different character code in Unicode (I guess).

> > However, once you actually start to consider the fact that -1 shouldn't
> > be broken, you'll probably also consider the fact that minus is
> > something different than a dash or a hyphen...
> What makes you think that in "-1", the "-" is inevitably just a surrogate
> for minus? Besides, the Unicode standard actually defines "-" as
> hyphen-minus, as a character with dual (or actually multiple) usage.
> Yet the Unicode line breaking rules play their own game, forgetting
> that duality.

The current version [1] of UAX #14 (Line Breaking Properties) doesn't
forget that duality, as far as I can tell.  The hyphen-minus character
gets its own character class (HY), and breaks between HY followed by NU
(numeric character class) are forbidden.  See rule LB18 in [2].

In summary, UAX #14 recommends that a line break be allowed before a
hyphen when the hyphen is preceded by a space, and after a hyphen if the
hyphen is followed by something other than a number.  (The pairwise
rules are really much more complex, but I'm ignoring the description of
the interaction with the hyphen of all the special characters whose
rules have higher priority than the hyphen, such as non-breaking spaces,
brackets, parentheses, quotes, etc.)

This seems like a reasonable compromise to me.  It doesn't break
negative numbers ("-1234"), but it allows breaking within hyphenated
words ("pseudo-element") and many points within chemical names
("1,1,1-trichloro-2,2-bis-(p-chlorophenyl)ethane", at all but the second
hyphen), although not US-style telephone numbers ("123-456-7890").


[1] http://www.unicode.org/unicode/reports/tr14/tr14-14.html
[2] http://www.unicode.org/unicode/reports/tr14/tr14-14.html#Algorithm

L. David Baron                                <URL: http://dbaron.org/ >
Received on Tuesday, 30 March 2004 13:44:19 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:08 UTC