- From: L. David Baron <dbaron@dbaron.org>
- Date: Tue, 30 Mar 2004 10:43:45 -0800
- To: www-html@w3.org
On Tuesday 2004-03-30 20:53 +0300, Jukka K. Korpela wrote: > On Tue, 30 Mar 2004 olafBuddenhagen@web.de wrote: > > If Unicode linebreaking rules are any good (I do not know them), > > the > > problem is actually a different one: Nobody but professional typesetters > > do know and respect the five or so different types of dash-like > > characters, all fulfilling a different purpose, and all having a > > different character code in Unicode (I guess). > > However, once you actually start to consider the fact that -1 shouldn't > > be broken, you'll probably also consider the fact that minus is > > something different than a dash or a hyphen... > > What makes you think that in "-1", the "-" is inevitably just a surrogate > for minus? Besides, the Unicode standard actually defines "-" as > hyphen-minus, as a character with dual (or actually multiple) usage. > Yet the Unicode line breaking rules play their own game, forgetting > that duality. The current version [1] of UAX #14 (Line Breaking Properties) doesn't forget that duality, as far as I can tell. The hyphen-minus character gets its own character class (HY), and breaks between HY followed by NU (numeric character class) are forbidden. See rule LB18 in [2]. In summary, UAX #14 recommends that a line break be allowed before a hyphen when the hyphen is preceded by a space, and after a hyphen if the hyphen is followed by something other than a number. (The pairwise rules are really much more complex, but I'm ignoring the description of the interaction with the hyphen of all the special characters whose rules have higher priority than the hyphen, such as non-breaking spaces, brackets, parentheses, quotes, etc.) This seems like a reasonable compromise to me. It doesn't break negative numbers ("-1234"), but it allows breaking within hyphenated words ("pseudo-element") and many points within chemical names ("1,1,1-trichloro-2,2-bis-(p-chlorophenyl)ethane", at all but the second hyphen), although not US-style telephone numbers ("123-456-7890"). -David [1] http://www.unicode.org/unicode/reports/tr14/tr14-14.html [2] http://www.unicode.org/unicode/reports/tr14/tr14-14.html#Algorithm -- L. David Baron <URL: http://dbaron.org/ >
Received on Tuesday, 30 March 2004 13:44:19 UTC