[css3-text] Comments on hyphenation

Hi

As the guy behind Hyphenator.js I'm glad to see things going on in the field of hyphenation in browsers. First implementations have occured now in Firefox and Webkit and I think the standards for hyphenation should going to be reliable for implementors.

I'm not a computer scientist nor am I involved in any browser. I'm just a hobbyist. But I know Liangs hyphenation algorithm quite well by now (I've implemented it in JavaScript: http://code.google.com/p/hyphenator/), I'm involved in creating better hyphenation patterns for German (aka Trennmuster: https://groups.google.com/group/trennmuster-opensource) and I've gained some experience in use cases from the user feedbacks I got for Hyphenator.js.
So maybe my thoughts will help?

I've read the discussion following http://lists.w3.org/Archives/Public/www-style/2011Mar/0352.html and of course the current WD (http://www.w3.org/TR/2011/WD-css3-text-20110412/).

1) Are there any other important documents?

2) Checking and fixing hyphenation:

	Summary:
	- keep ‘hyphens: all’
	- define mandatory way to load hyphen-patterns (also see 3)
	
A missplaced hyphen is a spelling error and thus not acceptable on a website. Automatic hyphenation can never be perfect due to ambiguous words. Therefor a webdeveloper has to be able to check and fix hyphenation.

It can be checked by displaying the text where all hyphenation points are visible. So we actually need ‘hyphens: all’.

Fixing hyphenation is rather complex. Three levels can be distinguished:
L1 (word-level): fix hyphenation by including conditional hyphenation characters manually which take precedence over automatic hyphenation.
L2 (site-level): fix hyphenation by defining a list of exceptions
L3 (language-level): fix hyphenation by updating the hyphenation-resource

At the moment there is no defined way to do L2. Christoph Päper proposed in http://lists.w3.org/Archives/Public/www-style/2011Mar/0721.html to add an ‘exceptions’-descriptor to the ‘@hyphenate-resource’ rule. I think that will be the most obvious way.

In order to fix hyphenation on a language-level (L3) webdevelopers should be able to provide their own ‘[@]hyphenate-resource’. I'm convinced that this rule must be mandatory to UAs (more on this later).
Further, the format of the resource has to be defined. AFAIK all implementations use Liangs TeX-hyphenation algorithm. So the structure defined in hyphen-2.7.1 (part of hunspell) looks quite good. It's concise and easy to change. (More on this later.)

Since changing or recomputing hyphenation patterns requires deep understanding of the underlying algorithm the ‘exceptions’-descriptor is a handy way for fixing hyphenation.


3) hyphenate-resource vs @hyphenate-resource

	Summary:
	- @hyphenation-pattern instead of @hyphenate-resource
	- method analog to @font-face

I'll take a step back and see how this works for fonts:

@font-face {
  font-family: Gentium;
  src: url(http://example.com/fonts/Gentium.ttf);
}

p { font-family: Gentium, serif; }

The @rule defines kind of a alias to a font resource on the server. As soon as the @rule occurs the UA can load the resource.
Later on the alias is used to refer to that resource.

I think an analog method should be used for hyphenation:

@hyphenation-pattern { /*instead of *-resource */
	hyphenate-resource: <hyphenate-resource>;
	src: url(<url>);
	[exceptions: <comma separated string of hyphenated words>;]
	[lang: <BCP47>;]
}

p {
	hyphens: auto;
	hyphenate-resource: <hyphenate-resource>, local;
}

Setting the descriptor 'lang' in the @rule is a shorthand for

:lang(en) {
	hyphenate-resource: <hyphenate-resource>;
}

Advantages:
+ method known from @font-face
+ Minimal data storage (1 resource per pointer/language, 1 pointer per element)
+ language tagging not required but encouraged
+ Can select patterns on language or on hyphenate-resource
+ Can use any selector
+ Backward compatible to hyphenate-resource: <url>
+ typically @hyphenation-pattern occurs early in the CSS file (preloading)

Disadvantages:
- possible conflicts when language and hyphenate-resource are oppositional
(- complicated)

What do you think?

Sincerely,
Mathias

Received on Monday, 16 May 2011 08:44:11 UTC