W3C home > Mailing lists > Public > www-style@w3.org > February 2014

Re: [css-text] feedback on hyphenation

From: Peter Moulder <pjrm@mail.internode.on.net>
Date: Wed, 19 Feb 2014 22:27:29 +1100
To: www-style@w3.org
Message-ID: <20140219112728.GA29020@mail.internode.on.net>
On Wed, Feb 12, 2014 at 02:17:20PM +0000, Håkan Save Hansson wrote:

> There are some language specific special cases with hyphenation. In Swedish for instance, if you write the two words "matta" (carpet) and "tjuv" (thief) as one you write it as "mattjuv", with two t letters. This should hyphenate into "matt-tjuv", with three t letters. This is not a hyphenation rule, but rather a type rule: when you write two words as one, there may never be more than two of the same letters where then two words concatenate.
> 
> If you want to use a manual soft hyphen (&shy;) for such a word you're in trouble. My suggestion is that when you write "matt&shy;tjuv" in text and it is displayed without hyphenation, it respects this rule and suppresses one of the three letters t. I don't know if such a rule could have a negative effect on texts in other languages, but I don't know of any language allowing three subsequent letters.

If it's relevant: modern German retains original spelling of components
up to three successive same letters (though this has changed over time,
e.g. Flussschifffahrt was previously written Flußschiffahrt).
(I don't know how this word should be hyphenated if it appears in the middle
of some Swedish text.)

Regarding "hyphens-[Maximum count of same letter before and after hyphen]: 2":
although not an absolute, we'd usually try to avoid having the document render
with misspellings when the stylesheet fails to load.

One could say that this problem is just a specific instance of faulty
hyphenation software, and should be reported as a bug in that software rather
than a bug in CSS or HTML.

That said, there are very likely other cases where some content uses a
made-up word (such as "fragmentainer"), and the content author wishes to
provide dictionary-like information such as hyphenation (for visual user
agents), pronunciation (for aural user agents), part-of-speech tag (to help
parsing, e.g. to help aural user agents with sentence intonation) and so on.

For the "made up word in particular content" case, I would lean towards
considering the document language (such as HTML) as the more appropriate
place for the author to provide this information.  But if it were to be done in
CSS, then one model would be to use a "data:" URI in dictionary-like properties
such as the former 'hyphenate-resource' property.

pjrm.
Received on Wednesday, 19 February 2014 11:27:57 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:39:19 UTC