W3C home > Mailing lists > Public > www-style@w3.org > July 2014

Re: Ambiguous hyphenation cases with

From: Zack Weinberg <zackw@panix.com>
Date: Tue, 22 Jul 2014 14:16:41 -0400
Message-ID: <CAKCAbMiE-xroyLzgfAJmCU9bzoFDJs9whVLyhHLjf4hQ7KnMEA@mail.gmail.com>
To: Christoph Päper <christoph.paeper@crissov.de>
Cc: "www-style@w3.org" <www-style@w3.org>, Unicode <unicode@unicode.org>
On Tue, Jul 22, 2014 at 12:14 PM, Christoph Päper
<christoph.paeper@crissov.de> wrote:
> fantasai <fantasai.lists@inkedblade.net>:
>
>>> The problem is that the hyphenation system in itself can't decide how
>>> to change the spelling, without any "dictionary"   functionality. It
>>> can't know if I meant "mat-tjuv" ("food thief" in Swedish) or "matt-tjuv"
>>> ("carpet thief") when I wrote "mat&shy;tjuv". So there has to be a way
>>> to tell the hyphenation system that.
...
>   “mattjuv, mat&#x34F;tjuv”
>
> Possible Unicode solution with a new combining character that makes the preceding character or grapheme – I’m not sure which – invisible except at the end of a line:
>
>   “mattjuv, matt&#x2065;tjuv”
>
>   U+2065 – Combining Collapse or Reduplicating Soft Hyphen or so

I think I'd prefer new tags to new magic entities.  In TeX this would be

    mat\discretionary{t-}{}{}tjuv

so maybe in HTML

    mat<dbr before="t-">tjuv

also accepting after= and nobreak= attributes.  It's verbose but it's
easier to remember, I think.

I'd also support a "hyphenation" CSS property with the same semantics
as TeX's \hyphenation{}, i.e.

    hyphenation: "un-break-able" "mom-ent";

overrides the built-in hyphenation dictionary for the words
"unbreakable" and "moment" (within the selected elements; normally one
would put this on <body>).

For bonus points,

    hyphenation: "mat[t-//]tjuv"

precise syntax to be bikeshedded.

> All solutions require author education.

Yah.

zw
Received on Tuesday, 22 July 2014 18:17:05 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:39:23 UTC