Re: Feedback on hyphenation properties

Håkon Wium Lie wrote:

> > Finally we think that doing language-sensitive hyphenation is hard
> > because most web content does not have the appropriate "lang"
> > attributes. We'd like to suggest a property that permits
> > language-sensitive hyphenation, namely "hyphenation-locale" (or
> > "hyphenate-locale"), that an author can use to inform the UA about
> > what locale should be used for hyphenation:
> > 
> > hyphenation-locale: auto | string
> > where the string is a locale identifier.
> > 
> > If not auto, the value would override the language derived from any
> > present "lang" attributes.
> 
> This would remove an incentive to start using the 'lang' attribute. If
> we want to encode such information in CSS (I'm not sure we do) it may
> be better to offer a property that can also be used outside of
> hyphenation, no? E.g.:
> 
>   body { locale: 'en' }

I don't think language/locale makes sense as a presentational property
unless it's for a very specific reason.  For example, the
'font-language-override' property in CSS3 Fonts is intended as an
override to access language-specific handling in a given font; in
general, these should be inferred from the value of the 'lang' attribute.

Simon Fraser wrote:

> One reason we see a need for this is that we have to hyphenate content
> that lacks "lang" attributes, but for which there is out-of-band data
> about the language (e.g. EPUB). Another reason is that we may be able
> to deduce something about the language from analysis of the content,
> and thus need to propagate the results of that analysis to the
> hyphenation system somehow.
> 
> > it may
> > be better to offer a property that can also be used outside of
> > hyphenation, no? E.g.:
> >
> >  body { locale: 'en' }
> 
> This is a reasonable suggestion. Knowledge about the language is also
> used for collation (e.g. for "find" algorithms), and for font
> substitution, so it seems reasonable to have a property independent of
> hyphenation for those things as well.

Rather than a new property it seems reasonable to define hyphenation
handling in such a way that user agents use the 'lang' attribute when it
exists or infer it when it doesn't.  I don't quite understand your EPUB
example, xml:lang is supported there, no?  If it's in some form of
associated metadata then I still don't see a problem with user agents
picking that up in cases where the 'lang' attribute it not defined.

How exactly to deal with multilingual text (e.g. Latin words used within
runs of Japanese text) is also an important case to consider.

Cheers,

John Daggett

Received on Friday, 6 August 2010 02:27:05 UTC