RE: [css3-text] script-specific functionality from Koji Ishii on 2011-04-17 (www-style@w3.org from April 2011)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Sun, 17 Apr 2011 12:30:10 -0400
To: Håkon Wium Lie <howcome@opera.com>
CC: "www-style@w3.org" <www-style@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AC2874AF7@MAILR001.mail.lan>
Thank you again Håkon for a long and thoughtful feedback.

>  > > Therefore, I suggest removing 'text-trim' from this
>  > > specification and consider addressing the functionality
>  > > nearby to the 'font-kerning' property.
>  >
>  > Precisely speaking, the 'text-trim' property is not "kerning".
>  > Kerning information is created by font designer and stored in font
>  > files, so it makes sense to have it in font specification. This
>  > feature is not stored in font file, but is algorithmic spacing
>  > control between two character classes, which may add or remove
>  > inter-character spaces, so I think here right after
>  > 'letter-spacing' is the right place.
> 
> Kerning information is often stored in the font, but the term
> "kerning" is more general. For example, some software provides
> auto-kerning based on the shapes of the glyphs. This seems similar to
> the algorithmic spacing control that you refer to.

Please allow me to repeat that this is not a kerning feature, and I agree that the way it uses the term "kerning" here is confusing. You're not the first person to say this[1], and I think Eric Muller's response[2] describes this feature better. The feature is not kerning, and it was my bad to use wrong term for the feature.

I have merged 'text-trim' and 'text-autospace' into a single feature "Character Class Spacing"[3]. The property name is also changed to 'text-spacing'. It's a feature that controls spacing between two character classes. This is closer to the way InDesign has this feature. I hope this makes the property more generic, and hope you feel better about this.


>  > > The 'line-break' property lists three values without
>  > > really defining them. Some rules for Japanese and
>  > > Chinese are suggested, but the spec doesn't say how
>  > > to interpret these values in for other languages other
>  > > than leaving it to the UA. The specification must be
>  > > more precise if we want interoperable results.
>  >
>  > The list includes East Asian code points only because the feature
>  > is needed only for them.
> 
> I'd like to avoid introducing features that are tied to specific
> scripts. It seems that the feature in question is close enough to
> kerning that we can make it more generally useful.

This feature is for all block scripts[4]; i.e., scripts that don't use spaces to delimit words. UAs can break lines between any two grapheme cluster boundaries for these scripts, but there are some exceptions, like lines should not break before commas, periods, or closing parentheses. This property controls the level of exceptions and is abstracted, so it's script-group specific, but is not script specific. Clustered scripts (Southeast Asian such as Thai) may join this, but its spec is still under discussion in Unicode.

It is common and super important to have exceptions. How many levels authors may want to control isn't as quite as common as having exceptions though. iBooks, for instance, can display Japanese text today, but it doesn't have any line break exceptions built-in. By not having exceptions, it is considered that it doesn't support Japanese by most audiences. Line break exception rules is such an important feature.

> And if it's not in
> common use today, some designer may find creative use for it tomorrow.
> Also, UAs will encounter combinations that have not been in common use
> in the past, and the spec should define what the behavior is.

Hmm...I'm not sure if I understand this part, sorry for my English skills. Can you explain a little more? If you are worried about designers using the feature for different purposes, IE supports this property[5] since IE5, and AH as well. Does the fact resolves your concerns?

>  > > This line needs some explanation:
>  > >
>  > >  ‘auto’ is equivalent to the value of the ‘text-align’ property
>  > >  except when ‘text-align’ is set to ‘justify’, in which case it is
>  > >  ‘justify’ when ‘text-justify’ is ‘distribute’ and ‘start’ otherwise.
>  > >
>  > > What's the use case and is it worth introducing the interdependency?
>  >
>  > It's a nature of 'justify' where alignment for the last line and
>  > for the rest are different. I you think about regular justified
>  > paragraphs, all lines except the last one should be justified. So,
>  > when 'text-align' is 'justify', 'text-align-last' must be 'start'.
> 
> Would it make more sense to introduce a :last-line pseudo-element,
> similar to first-line? Then we can keep properties simpler, it seems.

That's an interesting idea I have never thought of. This property is implemented by IE, Prince, and AH, (although values were added since 2003CR,) so I thought we shouldn't change the property name. Does using pseudo-element helps implementations? 


>  > > I'm not convinced it makes sense to set these types of
>  > > values in a style sheet. For example, what does it mean
>  > > to say:
>  > >
>  > >  <p style="text-justify: inter-cluster">候选</p>
> 
>  > Your example doesn't make sense since you're using the value for
>  > clustered scripts such as Thai against Chinese, which is not a
>  > clustered script.
> 
> Indeed, it doesn't make sense. But UAs will be facing code like this
> and they must deal with it somehow.

Ok, I think I understand your question. The spec has clear guidelines; the two characters you showed are "block" script. Prioritization of Expansion Points table in the spec says "inter-cluster" value prioritize block scripts as 2nd priority. I hope this clarifies your concern.


>  > > It seems more natural to describe "justification opportunities"
>  > > between various types of characters.
>  >
>  > Hmm...I was thinking this feature defines "justification
>  > opportunities" between various types of characters. What did I
>  > miss?
> 
> You're right that the value indicates justification opportunities, but
> it closes more opportunities than it opens. For example, the defintion
> of 'inter-cluster':
> 
>   Justification primarily changes spacing at word separators and at
>   grapheme cluster boundaries in clustered scripts. This value is
>   typically used for Southeast Asian scripts such as Thai.
> 
> So, when this value is encounterd, most of the justification (80%?
> 90%?) happens "at word separators and at grapheme cluster boundaries
> in clustered scripts". It seems that it discourages justification by
> way of microtypography -- is this intentional.

No, the spec does not discourage UAs to implement any methods of microtypography to do smarter justifications. The spec supports 'letter-spacing' and 'word-spacing', so you can support at least two methods of microtypography and authors can control min/max of them. Other than that, the spec says "The exact justification algorithm is UA-dependent," so there's no standard way for authors to control it, but UAs are welcomed to implement smarter microtypography.


> Also, if some (say) Japanese text is interleaved in (say) Thai, should
> UAs not use what they know about justification in Japanese? If yes, I
> don't understand why not.

This is one of the most difficult question to answer; to be honest, I don't know. I'm trying to reach someone who authors Japanese and Thai mixed documents without much luck.

But we have one example; when Japanese is interleaved in English, sometimes English authors do not want to justify using Japanese method. Japanese expands grapheme cluster boundaries, but doing so may look like each character is a word to English readers. That's why we have "inter-word" and "inter-ideograph". This example indicates that Japanese within English context may need to be justified differently from Japanese.

I'll keep trying to reach someone who knows Japanese and Thai mixed case, but I can't promise I can figure this out.


> I also find the description ambigous. Does it mean to say this?:
> 
>   Justification primarily changes spacing at word separators in all
>   scripts and at grapheme cluster boundaries in clustered scripts.

This means "both word separators and grapheme cluster boundaries are given priority 1 of justification opportunities, as in the table below." Does saying "spacing *both* at word separators and at grapheme cluster boundaries" help to resolve ambiguity? Again, I'm terribly sorry that my English skill prevents me from understanding how ambiguous this is and how to fix it. If you understand above, can you help me to fix?


>  > > It seems that the purpose of 'text-autospace' is to magically
>  > > add space around (say) English text inside Chinese?
>  > > Without there being space characters or markup in the text?
>  > > I suggest we rather encourage the use of markup as this also
>  > > allows the specification of the language. E.g.:
>  > >
>  > >    span:lang(en) { padding: 0 0.5em }
>  > >
>  > >    候选 <span lang="en">foo</span> 候选
>  >
>  > Close, but no. It's not about English and Chinese, it's about Latin
>  > characters within East Asian languages.
> 
> Sure. (The "say" indicated that English and Chinese were only examples.)
> 
>  > It's not necessarily English, just proper nouns or room numbers. I
>  > don't expect East Asian authors to mark every Latin characters as
>  > lang="en".
> 
> They don't have to mark the language (which would not make much sense
> for numbers). But, if they expect special formatting to apply, is it
> too much to ask that (e.g.) a <span> element is added?
> 
> The 'text-autospace' property is like a hard-coded content selector:
> 
>   http://www.w3.org/TR/2001/CR-css3-selectors-20011113/#content-selectors

> 
> If we are to have these, I'd rather have a generic mechanism.

I agree that we should make this more generic.

As said above, fantasai and I merged 'text-autospace' with 'text-trim' and renamed as 'text-spacing'[3]. Does this look a generic mechanism to you? Although proposed solutions were different, Adobe also said this should be more generic mechanism. 'text-spacing' is our try to make it more generic based on how InDesign designs the feature and on feedback from Steve Zilles at Adobe. 
 

>  > >  - does the 'force-end' value really force stops/commas to hang? So,
>  > >   unless the comma appears at the content edge, it is moved there?
>  >
>  > Yes to the first sentence. I'm not sure if I understand 2nd
>  > sentence correctly; if stops/comma appears at the line end, it
>  > hangs, even when the line can be formatted/justified without doing
>  > so. This is one of the style variations used in East Asia. I
>  > haven't seen this style used in Latin script.
> 
> Could you show a picture of this?

I created a picture of the same text filled into the same box at:
http://dev.w3.org/csswg/css3-text/hanging-punctuation-end.png

Left box is 'allow-end'; 2nd line hangs because the punctuation otherwise does not fit, but 1st line does not hang because it fits.
Right box is 'force-end'; the punctuation at 1st line hangs, and the line is then justified (expanded).

I'll add this picture to the spec soon if this picture explains good enough.

[1] http://lists.w3.org/Archives/Public/www-style/2010Sep/0814.html 
[2] http://lists.w3.org/Archives/Public/www-style/2010Sep/0830.html

[3] http://dev.w3.org/csswg/css3-text/#text-spacing-prop

[4] http://dev.w3.org/csswg/css3-text/#script-groups

[5] http://msdn.microsoft.com/en-us/library/ms530782(v=VS.85).aspx


Regards,
Koji
Received on Sunday, 17 April 2011 16:32:47 UTC