Re: Last call for css3-text from Markus Gylling on 2013-11-07 (public-digipub-ig@w3.org from November 2013)

From: Markus Gylling <markus.gylling@gmail.com>
Date: Thu, 7 Nov 2013 22:53:03 +0100
To: Bert Bos <bert@w3.org>
Cc: public-digipub-ig@w3.org
Message-Id: <1EAE3685-E427-4F48-990C-74AC509FEAB4@gmail.com>
Dear Bert, 
please find below comments on css3-text LC from the Digital Publishing IG. We will of course be available to answer any follow-up questions that the CSS WG might have. 

Best regards, /markus

--- begin ---
1. It would be great to keep the ‘hanging-punctuation’ property, though I understand it is awaiting implementations. What is the timeline here? That is, when would an implementation need to appear in order to preserve this property?

This is certainly important to us. Antenna House has implemented this, and it's on the roadmap for Prince. 

2. In section 1.3, after the example:
"Within this specification, the ambiguous term character is used as a friendlier synonym for grapheme cluster. See Characters and Properties for how to determine the Unicode properties of a character."
"A letter for the purpose of this specification is a character belonging to one of the Letter or Number general categories in Unicode. [UAX44]"
If I replace 'character' in the second paragraph with 'grapheme cluster', I am not sure I get a reasonable answer. For instance, is U+0067  + U+0308 a letter? I don't think U+0308 is, does that disqualify the whole cluster? Or is this a different use of the term character? Does Unicode define such clusters as belonging to all the groups all the code points belong to?

3. The only place the spec mentions that text-transform should affect line breaking is in an informative example (#2), at least that I saw. Should this be mentioned in a normative section? Some line breaking changes are obvious (for instance, changing the width of the glyphs will alter line breaking), but others are more obscure (for instance, transformation to full width).

4. From 5.1, last bullet point:
"For line breaking in/around ruby, the base text is considered part of the same inline formatting context as its surrouding content, but the ruby text is not: i.e. line breaking opportunities between the ruby element and its surrounding content are determined as if the ruby base were inline and the ruby text were not there." [Also, note the typo: surrouding]
The first part of this sounds like breaks are allowed in a single run of base text (difficult, I assume), but the second part sounds like breaks are only allowed at boundaries of the ruby element. It seems like, in practice, breaks are allowed anywhere in a ruby element a break would be allowed if such a location is also a base text boundary.
For example, consider this snippet:
<p>だ<ruby>大分<rt>だいぶ</rt>日数<rt>ひかず</rt></ruby>が</p>
From "the base text is considered part of the same inline formatting context as its surrouding content, but the ruby text is not", I might imagine breaks as though the text were written
だ[1]大[2]分[3]日[4]数[5]が
But, this: "i.e. line breaking opportunities between the ruby element and its surrounding content" seems to imply this only covers line breaks at the boundary of the ruby element itself. In which case I would get:
だ[1]大分日数[5]が
However, I would expect the correct breaking would be neither of those, but rather:
だ[1]大分[3]日数[5]が
I am not certain how I can interpret the spec to generate those line breaks.

5. In "5.2. Breaking Rules for Punctuation", in this sentence and the one below it that is similar:
"If the content language is Chinese or Japanese, then additionally allow (but otherwise forbid) for ‘normal’ and ‘loose’:"
It's not clear to me what the 'otherwise' applies to - is it the 'normal' and 'loose', so it is forbidden in strict when the language is Chinese or Japanese? Or does it apply to the language as well, so it is forbidden in strict for Chinese and Japanese, and for any value for all other languages? If the latter, then the implication is that in eg English, breaks before  U+2010 are forbidden. However, the later clarifying note seems to indicate that non-CJK text is only affected when the language is Chinese or Japanese.

6. In "6.1. Hyphenation Control", the sentence: "The UA is therefore only required to automatically hyphenate text for which [...]"
Is it the case that a UA is ever *required* to automatically hyphenate? Perhaps this should be weakened to "Therefore, if no language is specified or no hyphenation resource is available to the UA for a specified language, the UA may choose to treat 'auto' as 'manual'."

Section 6.1 also states, "Conditional hyphenation characters inside a word, if present, take priority over automatic resources when determining hyphenation opportunities within the word." Is this a strong-enough statement? We've seen many cases where a word will hyphenate one character away from a soft hyphen. 

6.1 In example 8, there is an extra nun in نوشتنن, at the end. I think it should be نوشتن.

7. Not really wrong, but the order of property names in the title for 6.2 is the opposite of the order just below, in the definition, ‘word-wrap’/‘overflow-wrap’ vs overflow-wrap/word-wrap. Just a little weird.

8. "6.2. Overflow Wrapping", so sayeth Yoda:
"[...] and grapheme clusters must together stay as one unit." Maybe "stay together" instead?

9. In "7.1. Text Alignment", "text-align: start end" sounds a lot like "text-align-last: *", giving special treatment to the first line instead of the last line, with less control. Perhaps there should be a separate property for controlling the first line alignment, just like there is for controlling the last line. Then text-align could become a shorthand. For example:

text-align: center == text-align-first: center, text-align-middle: center, text-align-last: auto
text-align: center right == text-align-first: center, text-align-middle: center, text-align-last: right
text-align: left center right == text-align-first: left, text-align-middle: center, text-align-last: right

This makes the proposed 'text-align: start end' become 'text-align: start end end' instead.
Of course, the down side is this would require two new properties ("text-align-first", "text-align-middle"). Not sure if this is worth considering at this point, but it seems odd to handle this in different ways for different special lines. Perhaps drop 'start end' for now and reconsider for level 2?

Sometimes we need to force a line-break inside a paragraph for various reasons [novelists-sometimes-string-together-dozens-of-words-with-hyphens-leaving-no-natural-break-points].  Having text-align-last control this is almost never what we want. In the most common case, we want the last line left-aligned and all other lines justified, as in most books published in the last five hundred years. Separating text-align-middle from text-align-last would be very helpful.

10. What impact do zero-width letters and zero-width word-separators have on the inter-word and distribute text-justify values?

11. I take exception to example 10 in 7.3.5. Both the greedy algorithm and the Knuth/Plass algorithm are O(n). What performance metrics are you using to determine the relative speed of these algorithms? Additionally, Knuth/Plass is easily adapted to other languages, so it applies equally to example 11. Perhaps "harder to implement" instead?

12. "8.1. Word Spacing": Can this property be used to make words overlap? That is, are values less than -100% allowed? 'letter-spacing' says there may be UA limitations for such things.

13. letter-spacing says it doesn't apply at the start/end of a line. Should there be similar text be in word-spacing?

14. At the end of word-spacing (just after example 13), the text "Word-separator characters include [...]" - is this considered an exhaustive list? If so, this should be made clear, otherwise some sort of guidelines for deciding what else might be a word-separator would be useful.

15. In "8.2. Tracking", just after example 14: "[...] to the innermost element element that contains the two characters [...]"
Just one element?

16. And just after example 15: "Letter-spacing ignores zero-width characters (such as those from the Unicode Cf category)." Does this mean characters that are defined to be zero-width, or characters whose width might be zero? For instance, given:

span.zero { display: inline-block; width: 0; }
p {letter-spacing: 1em;}

<p>a<span class="zero">b</span>c</p>

Would this be viewed as "a bc" (1em after 'a', zero-width 'b', 1em after end of 'b', 'c') or as 'a' with 'b' and 'c' on top of each other 1em later?

We are disappointed that maximum and minimum values for word-spacing and letter-spacing were removed in this draft. Better control over justification is a key requirement for us.

17. In "9.1. First Line Indentation", it is not clear to me what 'each-line' is doing. Does this simply make the indent of lines after hard line breaks indent, and they wouldn't otherwise? If so, perhaps it should say "In addition to the first line of a block container each line after a forced line break are also affected. Lines after a soft wrap break are still not affected." Or maybe there is something else going on I just don't understand.

I found this section a bit confusing. Perhaps examples of "hanging" and "each-line" would be helpful. 

18. In "9.2. Hanging Punctuation", the 'Animatable:' table entry has a spurious gt ('>').

19. Appendix A, steps 5.iv and 5.v - how do you do letter and word spacing without knowing the font in use? For instance, a percent value for letter spacing depends on the advance measure of the character, which will depend on the current font.

20. Appendix B:
"[...]  is to help UA developers to implement default stylesheet [...]" - 'a default stylesheet'? Or maybe 'the default stylesheet'? Or even 'default stylesheets'?

--- end —

On 11 Oct 2013, at 22:22, Bert Bos <bert@w3.org> wrote:

> The CSS WG issued a last call for css3-text:
> 
>    CSS Text Module Level 3
>    http://www.w3.org/TR/css-text-3/
> 
> Abstract:
> 
>    This CSS3 module defines properties for text manipulation and
>    specifies their processing model. It covers line breaking,
>    justification and alignment, white space handling, and text
>    transformation.
> 
> We'd like to especially ask the I18N WG and DPub IG for a review.
> 
> Everybody else is invited to send comments, too, of course.
> 
> We've unilaterally set the deadline for comments to November 7. Please, let us know if you need more time.
> 
> 
> 
> For the CSS WG,
> 
> Bert
> -- 
>  Bert Bos                                ( W 3 C ) http://www.w3.org/
>  http://www.w3.org/people/bos                               W3C/ERCIM
>  bert@w3.org                             2004 Rt des Lucioles / BP 93
>  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
>
Received on Thursday, 7 November 2013 21:53:35 UTC