- From: Etan Wexler <ewexler@stickdog.com>
- Date: 27 Nov 2002 23:00 +0000
- To: www-style@w3.org
Following are comments on the Working Draft, "CSS3 module: text", <http://www.w3.org/TR/2002/WD-css3-text-20021024>. 2. Introduction "In both CSS1 and CSS2, text formatting has been limited to simple effects like for example: text decoration, text alignment and character spacing." Change "character" to "grapheme cluster". "- wide-cell glyph (e.g. Han) which is the n-th character in the text run" Change "character" to "glyph". "- narrow-cell glyph (e.g. Roman) which is the n-th glyph in the text run" Change "Roman" to "Latin". "Many typographical properties in East Asian typography depends on the fact that a character is typically rendered as either a wide or narrow character." Change the second occurrence of "character" to "glyph". "Spacing between these characters in the diagrams is usually symbolic" Change "characters" to "glyphs". [Section 3.3 was accidentally skipped.] 3.5. Script character classification: the 'text-script' property Why has the name 'text-script' been chosen when XSL uses 'script'? "For example, line breaking or text justification behaviors depend on the 'dominant' script of the textual content of an element." Why do quote marks delimit "dominant"? "Use the first character descendant, after any reordering due to character direction and bi-directionality, which has an unambiguous script identifier to determine the dominant script of the element's content." Reordering in the bidirectional algorithm affects the glyphs but does not alter the character sequence. The phrase "after any reordering due to character direction and bi-directionality" thus changes nothing and should be eliminated. "In the absence of any textual components with a clear script identifier (or no textual content at all), the computed value is 'Latin'." The value 'Latin' is not given in ISO 15924 and is thus not valid CSS. Use the value 'Latn', with no "i". "<script> A script definition in conformance with [ISO15924]." The value is to be a script identifier (or "specifier", in XSL language), not a script definition. A script definition consists of prose and intangible history and usage. A script identifier is a machine-readable and relatively short string. I prefer Unicode Technical Report #24, "Script Names" (<http://www.unicode.org/unicode/reports/tr24/>), over ISO 15924. The Unicode sript names are in actual English. I understand that for compatibility with XSL, ISO 15924 must be the reference. Due to the change in property name, however, the desire to retain compatibility is in doubt. What is the lexical form of <script>? I assume that it is an identifier, but somebody might assume that it is a string. Explicitness is needed for interoperability. 4.1. Text alignment: the 'text-align' property "<string>" "If set on other elements, it will be treated as 'start'." I suggest the revision, ""If set on other elements, the computed value is 'start'." 4.2. Justification: the 'text-justify' property "It affects the text layout only if 'text-align' is set to 'justify'. That way, UA's that do not support this property will still render the text as fully justified" If the 'text-justify' values were allowed for 'text-align' as meaning "justify in this manner", an extra declaration could be used for fallback. This would mean writing the following, for example. text-align: justify; text-align: newspaper; That would be instead of the following. text-align: justify; text-justify: newspaper; This presents a trivial difference to the author (the reduction in length slightly favoring my proposal). Where the difference really matters is in implementations, which would not have to carry an extra property on each element. "Scripts using space between word without connector (Latin-based, Hebrew, etc...) and symbol characters." What scripts are "Latin-based"? "Greek-based" would include Latin, Greek, Cyrillic, and Coptic. "However, if the kashida-space property has a non zero value it is recommended to use kashida elongation for Arabic text." The property is called 'text-kashida-space', although 'kashida-space' seems preferable to me. "The concept of a word is script dependent, the exact algorithm is determined by the user agent." Change to "The script guides what constitutes a word, although the user agent determines the exact algorithm." "At minimum, justification is expected to occur at each white space boundary." Does this intend to include "zero width space" and the explicit-width spaces? "The diagram below illustrates this mode, by showing how the characters are laid out in the last two lines of an element" Change "characters" to "glyphs". "The threshold value may be related to the column width (in number of characters)." Change to "The threshold value may be related to the ratio of column width to font size." "Mixed character layout in the last two lines of a newspaper justified element" Change "character" to "glyph". "In CSS3 a value of 'letter-spacing: 0' no longer inhibits spacing-out of words for justification." Why is this? A person setting 'letter-spacing' to '0' has clearly chosen something besides 'auto'. "most script groups (except Hindi)" Hindi is not a script group or script. Was the intent to except baseline-connected Indic scripts? "Mixed character layout in the last two lines of a distribute justified element" Change "character" to "glyph". "inter-cluster Plays the same role as inter-ideograph but for South Eastern Asian scripts. That is letter spacing only occurs for clusters belonging to those scripts. A cluster is defined as a group of characters formatted as a single unit." Change to the following and append a reference to Unicode Technical Report #29, "Text Boundaries" (<http://www.unicode.org/unicode/reports/tr29/>). "inter-cluster This is the Southeast Asian counterpart to 'inter-ideograph'. That is, letter spacing only occurs between script-defined grapheme clusters." "Plays the same role as inter-ideograph but for Arabic through the Kashida effect. That is, no letter spacing occurs for other scripts." Change to "This is the Arabic counterpart to 'inter-ideograph'. Letter spacing may be increased between Arabic letters, the extra space being filled by kashida. No letter spacing occurs for other scripts." [table] Change "Latin" to a more inclusive term or at least make a note similar to the one for "Devanagari". 4.3. Last line alignment: the 'text-align-last' property' "However, if the 'text-align' property is set to the value 'justify', the last line will be aligned to the start of the inline progression." The 'auto' value should allow the user agent to justify the last line if it passes a threshold determined by the user agent. 4.4. Minimum and maximum font size: the 'min-font-size' and 'max-font-size' property "Value: <font-size> | auto" "Computed value: <font-size>" The <font-size> value must be absolute. What would "min-font-size: smaller" mean? "'auto' means that the user agent determine the minimum readable font-size for the media." Capitalize "auto". "For example, a value is 9px is recommended for Latin scripts." Change to "For example, a value of '9px' is recommended for the Latin script." "'auto' means that there is no limit." Capitalize "auto". 4.5. Additional compression: The 'text-justify-trim' property "the blank space within the character area itself may be reduced without affecting the appearance of the glyph" Change to "the blank space within the glyphs themselves may be reduced without affecting the appearance of the filled parts of glyphs". "Character layout with punctuation and Kana compression" Change "Character" to "Glyph". 4.6. Kashida effect: the 'text-kashida-space' property' This property really wants to be called 'kashida-space'. "Kashida is a typographic effect used in Arabic writing systems that allows character elongation at some carefully chosen points in Arabic." Change to "Kashida is a typographic effect used in Arabic writing systems that allows glyph elongation at some carefully chosen points." "This property can be used with any justification style where kashida expansion is used (currently text-justify: auto, kashida, distribute and newspaper)." Change to "This property has a visible effect with any justification style where kashida expansion is allowed (currently 'text-justify' of 'auto', 'kashida', 'distribute' and 'newspaper')." 5. Indentation: the 'text-indent' property We still lack a graceful way to achieve hanging indents. "User agents should render this indentation as blank space." This will be misinterpreted. Change to "User agents should render this indentation without any of the element's normally positioned content." 6.1. Types of line breaking "Finally, the Unicode character: U+200B ZERO WIDTH SPACE can be inserted in such scripts to specify an explicit line breaking opportunity." Change to "To specify an explicit line breaking opportunity, the character U+200B ZERO WIDTH SPACE can be inserted in documents of Thai and similar scripts." 'A number of levels of line-breaking "strictness" can be used in Japanese typography.' Why is "strictness" in quote marks? "In addition, hyphenation is controlled by 'word-break-inside'." How does this relate to the XSL hyphenation model? "All these properties are also available through the 'word-break' short hand property." Change to "The 'word-break' shorthand property sets 'word-break-CJK' and 'word-break-inside'." Move into a separate paragraph. 6.2. Line breaking: the 'line-break' property "it is recommended that breaks between small katakana and hiragana characters be allowed" Change "katakana and hiragana" to "kana". "In Japanese, a set of line breaking restrictions is referred to as "Kinsoku". JIS X-4051 [JIS-X-4051] is a popular source of reference for this behavior using the strict set of rules. This architecture involves character classification into line breaking behavior classes. Those classes are then analyzed in a two dimensional behavior table where each row-column position represents a pair action to be taken at the occurrence of these classes. For example, given a closing character class and an opening character class, the intersection in that table of these two classes (the first character belonging to the opening class and the second belonging to the closing class) will indicate no line breaking opportunity. The rules described by JIS X-4051 have been superseded by the Unicode Technical Report #14 mentioned earlier." The majority of this paragraph appears superfluous. Change to the following and add a reference link for Unicode Technical Report #14. "In Japanese, a set of line breaking restrictions is referred to as "Kinsoku". JIS X-4051 [JIS-X-4051] is a popular source of reference for this behavior using the strict set of rules. The rules described by JIS X-4051 have been superseded by the Unicode Technical Report #14." 6.3. Word breaking: the 'word-break-CJK', 'word-break-inside' properties and the shorthand 'word-break' property "Keeps non-CJK scripts together (according to their own rules), while Hangul and CJK (including the Korean Hanja characters) break everywhere or according to the rules of the 'line-break' mode." Add "ideographs" after "Hangul and CJK". What determines whether 'line-break' is obeyed? "Same as 'normal' for CJK and Hangul" Add "ideographs" after "CJK". "CJK and Hangul are kept together. This option should only be used in the context of CJK used in small clusters like in the Korean writing system." Add "ideographs" after both occurrences of "CJK". "All word-break related properties are first reset to their initial values (all 'normal')." Change "All word-break related properties" to "The properties 'word-break-CJK' and 'word-break-inside'". Link the property names to the respective definitions. 7. Text Wrapping, White-space Control and Text Overflow The focus on line feed (U+000A) as the only line break character is specific to XML, to the detriment of CSS. Choosing the word "linefeed" for property names is one thing; a slight misnomer can be accomodated. Limiting implementation behavior to dealing with line feed only is another thing, and a bad one at that. 7.1. Text wrapping: the 'wrap-option' property "The best line-breaking opportunity is determined in priority by the existence of preserved line-feed characters (U+000A), or by the line-breaking algorithm controlled by the 'line-break' and word-break' properties." Change "'line-break' and word-break'" to "'line-break', 'word-break-CJK' and 'word-break-inside'". "independently of 'line-break' and word-break' properties." Change "'line-break' and word-break'" to "'line-break', 'word-break-CJK' and 'word-break-inside'". 7.2. White-space control: the 'linefeed-treatment', 'white-space-treatment', 'all-space-treatment' properties and the 'white-space' shorthand property "The white-space set is determined by the XML [XML1.0] specification" This binding to XML 1.0 works to the detriment of CSS, which will have a hard time accomodating non-XML languages, or even later revisions of XML. "Line feed characters are rendered as one of the following characters: a space character, a zero width space character (U+200B), or no character (i.e. not rendered)." Change to "Line feed characters are either rendered as a space character (U+0020), rendered as a zero width space character (U+200B), or not rendered." "The choice of the resulting character is conditioned by the script property of the characters preceding and following the line feed character." Add a reference to Unicode Technical Report #24, "Script Names", after "property". "A sequence of white space characters without any line feed characters is rendered as a single space character." Change to "A sequence of white-space characters without any line feed characters is rendered as a single space character (U+0020)." "A sequence of white space characters with one or more line feed character is rendered similarly to a single line feed character." Change to "A sequence of white-space characters with one or more line feed characters is rendered as a single line feed character." "In determining how to convert a LINE FEED character a user agent should consider the following cases, whereby the script of characters on either side of the LINE FEED determines the choice of the replacement." Change to "In determining how to convert a line feed character, a user agent should consider the following cases, whereby the scripts of characters preceding and following the line feed determine the choice of the replacement." "If the characters preceding and following the LINE FEED character belong to a script in which the SPACE character is used as a word separator, the LINE FEED character should be converted into a SPACE character." Change to "If the characters preceding and following the line feed character belong to a script in which the space character (U+0020) is used as a word separator, the line feed character should be converted into a space character." "If none of the conditions in (1) through (3) are true, the LINE FEED character should be converted into a SPACE character." Change to "If none of the conditions in (1) through (3) are true, the line feed character should be converted into a space character (U+0020)." "When white-space characters are collapsed for rendering purpose, the style applied to the collapsed set is the one that would be applied to first white-space character of the set." Change to "When white-space characters are collapsed for rendering purpose, the style applied to the replacement character is the style that would be applied to first white-space character of the original sequence." "Linefeed characters are transformed for rendering purpose into one of the following characters: a space character, a zero width space character (U+200B), or no character (i.e. not rendered)." Change to "The user agent either transforms each line feed character to a space character (U+0020), transforms each line feed character to a zero width space character (U+200B), or removes the line feed characters." "The choice of the resulting character is conditioned by the script property of the characters preceding and following the line feed character in the same line flow elements part of the same block element." Add a reference to the previously defined algorithm and clean up the end of the sentence, which makes no sense. "Linefeed characters are ignored. i.e. they are transformed for rendering purpose into no character." Change to "Line feed characters are ignored. They are removed and are not rendered." "White-space characters, when rendered as an advance width, use the width of the space character (U+0020)." Add "the glyph normally used for" before "the space". "White space characters, except for linefeeds, are ignored. i.e. they are transformed for rendering purpose into no character." Change to "White-space characters, except for line feed characters, are ignored. They are removed and are not rendered." "All white space characters are rendered as intended (advance width). The treatment of linefeeds is not determined by this property." Change to "White-space characters other than line feed are rendered as they are (with advance width)." "All white-space characters are rendered as intended." Change to "All white-space characters are rendered as they are." "The tab character (U+0009) is rendered as the smallest non-zero number of spaces necessary to line characters up along tab stops that are every 8 characters." Change to "The tab character (U+0009) is rendered as the smallest non-zero number of spaces necessary to reach or exceed the next tab stop. Tab stops occur in the inline progression direction every at points corresponding to multiples of eight times the width of the glyph normally used for space (U+0020)." "The definition of the property values are established by referring to the individual white-space properties set as follows" Change to "The definitions of the property values are established by the following table, which shows the settings of the constituent properties". 7.3. Text overflow: the 'text-overflow-mode', 'text-overflow-ellipsis' properties and the shorthand 'text-overflow' property "Text overflow deals with the situation where some textual content is clipped when it overflows the element's box in its text advance direction as determined by the writing-mode property value." Change "text advance direction" to "inline progression direction". "This situation may only occur when the 'overflow' property has the values: hidden, scroll and auto (in the latter case only when the UA behavior results in content scrolling)." Change to "This situation occurs only when the 'overflow' property has the value 'hidden', 'scroll' or 'auto' (in the latter case only when the user agent introduces a scrolling mechanism). "The hint is typically an ellipsis character "...", although the actual character representation may vary. An image may also be substituted. " Change to "The hint is typically a horizontal ellipsis character (U+2026), although the hint may be some other string or even an image. "If both hints should appear, only the 'after' hint is rendered." Change "should appear" to "are enabled." "The text-overflow is divided in properties: 'text-overflow-mode' that controls the presentation of hint characters, 'text-overflow-ellipsis' that controls the values of the hint characters presented at the box boundaries and a shorthand property: 'text-overflow'." Change to "Control over text overflow is divided among properties: 'text-overflow-mode' controls the presence and position of the hint, while 'text-overflow-ellipsis' controls what constitutes the hint. The shorthand property 'text-overflow' sets the other text overflow properties." "Name: text-overflow-mode" "Applies to: all block-level elements" What happens with inline-block elements? "an ellipsis string is inserted at each box boundaries where a text overflow occurs. The values of these ellipsis strings is determined by the 'text-overflow-ellipsis' property." Change to "A visual hint is inserted at each box boundary where text overflow occurs. The 'text-overflow-ellipsis' property determines the content of the hint." "The insertions take place at the boundary of the last full glyph representation of a line of text." Please clarify. "similar to 'ellipsis', but the insertions take place at the boundary of the last full glyph representation of a word within the line of text." Change to "A visual hint is inserted at each box boundary where text overflow occurs. The 'text-overflow-ellipsis' property determines the content of the hint. The insertions take place after the last word that entirely fits on the line." "The hint characters only replace textual information. If the clipping occurs on a replaced element, standard clipping occurs." Change to "The overlfow hints are active only for textual content. That is, the user agent must not render an overflow hint when only replaced content overflows." "will result on no ellipsis shown for its content (because it has a specified width and furthermore the text wrapping occurs in the 'hidden' overflow area of its parent element)." Change to "will result in the absence of a hint overflow (because the element has a specified width)." "In other words, the text-overflow-mode only affects the textual content of a block element which participate in its own inline flow." Please clarify. What is a block element which participates in its own inline flow? "Name: text-overflow-ellipsis Value: [<ellipsis-end> | <uri> [, <ellipsis-after> | <uri>]?]" Why is the comma needed? Change the production to "<ellipsis>{1,2}". Define <ellipsis> as [ <string> | <uri> ]. Change the following prose as appropriate. "Applies to: all block-level elements" What happens to inline-block elements? 8.1. Letter spacing: the 'letter-spacing' property "This property specifies spacing behavior between text characters." Change "text characters" to "grapheme clusters" and add a reference to Unicode Technical Report #29, "Text Boundaries". "However, this value allows the user agent to alter the space between characters in order to justify text." Change "characters" to "grapheme clusters". "This value indicates inter-character space in addition to the default space between characters." Change to "This value indicates spacing added between grapheme clusters in addition to the default spacing between grapheme clusters." "The value is added to the advance width of each spacing character (as opposed to combining character) or group of characters that are clustered in single grapheme unit (like in Thai, Khmer, etc.), including the last character of the element. Characters which are joined together by effect of applying a cursive font to them, or by standard typography rules (Arabic script, Northern Indian scripts like Devanagari) have the valued added to the normal advance width of each spacing characters. Combining characters (not spacing) do not get any letter-spacing effect, only the combination of the base character and its combining characters does." Eliminate all of this, as it is implied by the suggested prior use of the term "grapheme cluster". "For justification purposes, user agents should minimize effect on letter-spacing as much as possible (priority to word-spacing expansion/compression as opposed to character-spacing expansion/compression)." Change to "For justification purposes, user agents should minimize alteration of spacing within words. The priority should be to alter spacing between words." "The justification algorithm may further modify the inter-character spacing, but only in text where there is no other opportunities to distribute the extra spacing (such as single word on a line, ideographic text)." Change to "The justification algorithm may further modify the spacing between grapheme clusters, but only in text (such as single word on a line or ideographic text) where there is no other opportunity to distribute the extra spacing." "Because of the visual disruptive effect of modifying letter-spacing on writing systems which use joined characters, like for example Arabic, the usage of this property is discouraged in those cases." Change to "Because of the visually disruptive effect of modifying this spacing in writing systems, such as Arabic, which use joined glyphs, the usage of this property is discouraged in those cases." "There are cases like Japanese or Chinese writing systems where justification will change all letter-spacing effects as there is no other opportunity in the line to expand or compress the character content in order to fit the line span." Change to "There are cases, like in Japanese and Chinese writing systems, where justification will change all spacing between grapheme clusters, as there is no other opportunity in the line to expand or compress the textual content in order to fit the line." "Character spacing algorithms are user agent-dependent. For example, the spacing will not occur necessarily between all characters, but instead between each glyph that constitutes either a letter or a cluster unit." Change to "The user agent determines the exact algorithm for spacing between grapheme clusters." "Furthermore this property should not be used for scripts and/or fonts that link characters together (cursive fonts for Roman scripts, all Arabic cases, Indic scripts with headline like Devanagari, etc...). Character spacing may also be influenced by justification (see the 'text-align' property)." Change to "Furthermore, this property should not be set to a <length> for scripts and/or fonts that ligate glyphs with connecting strokes; such scripts and fonts include cursive Latin fonts, Arabic, and Devanagari. Spacing between grapheme clusters may also be influenced by justification (see the 'text-align' property)." "In this example, the space between characters in blockquote elements is increased by '0.1em'." Change "characters" to "grapheme clusters". "In the following example, the user agent is requested not to alter inter-character space" Change "spacing within words". "When the resultant space between two characters is not the same as the default space, user agents should not use ligatures." Change to "When the resultant spacing is not the default, user agents should not use ligatures." 8.2. Word spacing: the 'word-spacing' property "If there are no characters, the user agent doesn't have to create an additional character advance width." "There is no inter-word space. All white-space characters are treated like zero-length characters." Change to "Word-separating white-space characters are rendered with a width of zero." Change to "If there are no word-separating characters, the user agent doesn't have to create an additional advance width between words." "Determining word boundary is typically done by detecting white space characters. There are however many scripts and writing systems that do not separate their words by any character (like Japanese, Chinese, Thai, etc...), detecting word boundaries in these cases require dictionary based algorithms that may not be supported by all user agents." Change to "Determining word boundaries is typically done by detecting white-space characters. There are, however, many scripts and writing systems that do not separate their words by any character; such scripts and writing systems include Japanese, Chinese, and Thai. Detecting word boundaries in these systems requires dictionary-based algorithms that user agents may choose not to support." [Sections after section 8.2 could not be reviewed before the deadline.]
Received on Wednesday, 27 November 2002 20:00:41 UTC