Re: [css-text] feedback on hyphenation

The document I cited is a Unicode Standard Annex, which means it is an
integral part of the Unicode standard, so I don't think this aspect of the
meaning of SHY is up for debate (at least not here).

I would suggest that the reasonable way to handle this problem in CSS would
be to allow CSS to specify hyphenation exceptions (ie additional
hyphenation dictionary entries), and ensure that the syntax for hyphenation
exceptions includes a way to specify not only where the hyphenation point
is, but how the text around the hyphenation point changes.  For example,
the hyphenation dictionary entry for mattjuv could be specified as
mat{t-,t,t}juv: the three components within curly brackets specify the text
before the break, after the break, and when there is no break.  A
hyphenation dictionary entry of "foo-bar" would be short for "foo{-,,}bar"
(which could itself be shortened to "foo{-}bar"). (This is like
\discretionary in TeX.)

OpenOffice.org has support for something similar:

https://www.tug.org/TUGboat/tb27-1/tb86nemeth.pdf

James



On Wed, Feb 26, 2014 at 10:24 PM, Håkan Save Hansson <
hakan.hansson@edison.se> wrote:

> Hi James,
>
>
>
> I see your point, but to me the &shy; works as a hint or rather an
> override of the hyphenation system to manually control the hyphenation
> point(s). To me it’s quite logical to also involve it in this special case.
> My suggested CSS property “hyphenate-[Maximum count of same letter before
> and after hyphen]” doesn’t change the behavior of &shy; or the spelling but
> rather works as a hint/rule to the hyphenation system how to hyphenate this
> specific word. The &shy; points out where and my CSS property tells the
> hyphenation system how.
>
>
>
> Maybe the extra letter should be a “pseudo element”, like :after and
> :before. It doesn’t change the text. Copying the text will drop the extra
> letter just as it would the hyphenation from the hyphenation system (when
> generated by hyphens:auto).
>
>
>
> Maybe it would be better with a completely new “control character
> mechanism” for this, let’s call it “&shx;”, in contrast to using &shy;.
> First I thought that it would be more suitable for “progressive
> enhancement” as I assumed that it would degrade gracefully in older
> browsers as they don’t understand this character and thus ignore it. I was
> wrong. It turns out that browser actually displays the text “&shx;”.
>
>
>
> For those interested, I have been experimenting on a simulation that
> relies on client side javascript: http://codepen.io/Hawkun/pen/zxLij. It
> uses javascript to detect if a certain soft hyphen is used, then adds an
> extra letter after it according to the data in the HTML markup.
>
>
>
> -- Håkan
>
>
>
>
>
>
>
>
>
>
>
> *Från:* James Clark [mailto:jjc@jclark.com]
> *Skickat:* den 26 februari 2014 08:21
> *Till:* Håkan Save Hansson
> *Kopia:* www-style@w3.org
> *Ämne:* Re: [css-text] feedback on hyphenation
>
>
>
> On Wed, Feb 12, 2014 at 9:17 PM, Håkan Save Hansson <
> hakan.hansson@edison.se> wrote:
>
>
>
> There are some language specific special cases with hyphenation. In
> Swedish for instance, if you write the two words “matta” (carpet) and
> “tjuv” (thief) as one you write it as “mattjuv”, with two t letters. This
> should hyphenate into “matt-tjuv”, with three t letters. This is not a
> hyphenation rule, but rather a type rule: when you write two words as one,
> there may never be more than two of the same letters where then two words
> concatenate.
>
>
>
> If you want to use a manual soft hyphen (&shy;) for such a word you're in
> trouble. My suggestion is that when you write “matt&shy;tjuv” in text and
> it is displayed without hyphenation, it respects this rule and suppresses
> one of the three letters t.
>
>
>
> Unicode [1] says to do the opposite:
>
>
>
> When a SHY is used to represent a possible hyphenation location, the
> spelling is that of the word without hyphenation
>
>
>
> So you're supposed to write it as mat&shy;tjuv, and it's up to the
> hyphenation system to know how the spelling changes when the word is
> hyphenated.
>
>
>
> [1] http://www.unicode.org/reports/tr14/#SoftHyphen
>
>
>
> James
>

Received on Thursday, 27 February 2014 05:57:23 UTC