RE: Feedback on hyphenation properties

>From Håkon Wium Lie:

>The only such format I know is the format used by TeX and OpenOffice:

   >http://en.wikipedia.org/wiki/Hyphenation_algorithm

FYI: This algorithm (Liang) is available also in a JavaScript implementation
- hyphenator.js - which has matured into a very viable tool - as we await
native H&J support in browsers.

http://code.google.com/p/hyphenator/

   >> Finally "hyphenate-character" is odd in that the value takes a string,
   >> not just a single character.

      >I don't believe we have a character type. For the languages I know,
it
      >makes sense to only use one character. But I'm sure someone will find
      >something creative to do with strings.

+1 to strings.
It's an invitation to innovate at, seemingly, no extra cost.

Regards,

Rich

-----Original Message-----
From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf
Of Håkon Wium Lie
Sent: Thursday, August 05, 2010 6:02 AM
To: Simon Fraser
Cc: www-style@w3.org list
Subject: Re: Feedback on hyphenation properties

Also sprach Simon Fraser:

 > We are not keen on "hyphens" as a property name. This doesn't match
 > other CSS property names which are mostly descriptive. We suggest
 > "hyphenation" or "hyphenate" instead. Most word processing and
 > desktop publishing programs usually refer to this behavior as
"hyphenation".

The CSS WG had a long disucssion on this in 2007:

  http://lists.w3.org/Archives/Member/w3c-css-wg/2007JanMar/0514.html

As you can see, the property used to be called 'hyphenate' but was
changed to make it different from XSL. I think the new 'hyphens' work
well -- it's shorter an easier to type. 

 > One thing to bear in mind is that if we want a shorthand property
 > in future, we may wish to reserve "hyphenation" or "hyphenate" for the
 > shorthand, and use "hyphenation-mode"/"hyphenate-mode" for the longhand.
 > 
 > Another consideration is whether hyphenation should be controlled by
 > a new value for the "word-break" property.
 > 
 > The property names "hyphenate-before" and "hyphenate-after" don't convey
 > their purpose very well. The naive reader may assume that they are used
 > to specify characters before/after which splitting is allowed.
 > They are really "keep at least N characters before/after the
 > hyphen", which suggest they should have "min" in their names.
 > Unfortunately no succinct alternatives spring to mind.

I agree that it's hard to understand what these properties mean unless
you know what knobs are normally offer for hyphenation. XSL calls them:

  'hyphenation-push-character-count'
  'hyphenation-remain-character-count'

Few people will ever use these knobs, though, and the current names
fit nicely into the 'hyphenate' family.

 > Do we really need both "hyphenate-before" and "hyphenate-after"
properties,
 > or would a single "hyphenation-min-fragment-length" property suffice?

I think we do. The same argument was put forward for 'widows' and
'orphans' (another set of properties that are hard to get for the
naive reader :-), but having both is important in some cases. For
example, I use them extensively here:

  http://people.opera.com/howcome/2010/ibsen/digte/print.css

 > "hyphenate-lines" also doesn't convey its purpose very well. It's about
 > the maximum number of consecutive hyphenated lines. It's also odd to
 > have a "no-limit" value, rather than choosing a property name which
 > makes sense with a value of "none".
 > 
 > Finally "hyphenate-character" is odd in that the value takes a string,
 > not just a single character.

I don't believe we have a character type. For the languages I know, it
makes sense to only use one character. But I'm sure someone will find
something creative to do with strings.

Is your request based on constraints in the underlying text
composition engine?

 > Hyphenation resources
 > ---------------------
 > 
 > We think the "hyphenate-resource" property is problematic for two
reasons.
 > 
 > First, the dictionary format is unspecified and there is no "type"
parameter
 > for the resource, so there's no information the UA can use to determine
 > the format. This is especially problematic if the UA relies on some
 > underlying infrastructure for word breaking, and needs to pass the
resource
 > down to this infrastructure.
 > 
 > Secondly, simply supplying a list of resources to be checked in order
 > is problematic, because it may result in in appropriate hyphenation.
 > If no hyphenation opportunities are found for a given word in a given
 > language by consulting the first resource, then the algorithm suggests
 > checking the second resource, which may return a hyphenation opportunity.
 > However, it may do so for the wrong language.

The language issue can be avoided by setting different resources for
different languages, no?

  :lang(fr) { hyphenate-resource: url(foo), url(bar) }
  :lang(en) { hyphenate-resource: url(foobar), url(barfoo) }

The reason for having a comma-separated list is to allow different
hyphenation resource formats to be supplied.

The only such format I know is the format used by TeX and OpenOffice:

   http://en.wikipedia.org/wiki/Hyphenation_algorithm

This is also what Prince supports. You can see a demo here:

  http://www.princexml.com/howcome/2006/p6/p6demo2.pdf

(Note that the property names were changed in Prince after WG's
decision in 2007, as referred to above)

 > Finally we think that doing language-sensitive hyphenation is hard
 > because most web content does not have the appropriate "lang" attributes.
 > We'd like to suggest a property that permits language-sensitive
hyphenation,
 > namely "hyphenation-locale" (or "hyphenate-locale"), that an author can
use
 > to inform the UA about what locale should be used for hyphenation:
 > 
 > hyphenation-locale: auto | string
 > where the string is a locale identifier.
 > 
 > If not auto, the value would override the language derived from any
present
 > "lang" attributes.

This would remove an incentive to start using the 'lang' attribute. If
we want to encode such information in CSS (I'm not sure we do) it may
be better to offer a property that can also be used outside of
hyphenation, no? E.g.:

  body { locale: 'en' }

-h&kon
              Håkon Wium Lie                          CTO °þe®ª
howcome@opera.com                  http://people.opera.com/howcome

Received on Thursday, 5 August 2010 14:37:14 UTC