Comments in Indic Layout Requirements

I’ve looked over the 2014-11-27 version of
http://www.w3.org/International/docs/indic-layout/
and have some comments:

– The document should clarify which writing systems it intends to cover. Three languages are mentioned in section 1.1 as being written primarily or alternatively in Perso-Arabic script, but otherwise there’s no information regarding writing in this script and issues associated with it, such as bidirectional layout or justification using kashidas. Should this be covered here, or is the hope that it will eventually be covered in a requirements document for Arabic and other bidirectional scripts?

– It might be desirable to include a few other Brahmi-derived writing systems that are used in countries near India, such as Sinhalese, Nepalese, and Tibetan, since chances of them getting their own requirements document are pretty slim. South-east Asian scripts may eventually get covered by their own requirements document.

– Most of the content of the document seems focused on Hindi written in Devanagari. This seems to imply that all other Indian writing systems work pretty much the same as this one. I’m not an expert, but I’ve heard claims that there significant differences between Indian writing systems, especially between southern ones such as Tamil and Malayalam and the northern ones. If that’s the case, such differences should be documented.

– The document is a lot shorter than the Japanese layout requirements. Readers will generally assume that for all aspects of text layout that are not covered in the document generic/English rules apply: Text emphasis by using bold/italic/underline, justification by extended spaces, headings in larger type and/or bold, quotations in “” (or « », or other quotation marks), page numbers in Western digits, etc. Is that all fine?

1.2.1 “Unicode uses a 16 bit encoding that provides code point for more than 1 million characters”: Well, UTF-16 exists, but a 16-bit encoding has not been at the core of Unicode for a long time. At the core are code points between U+0000 and U+10FFFF.

1.2.1 “UNICODE”: Unicode is not an acronym and is therefore not normally written in all-caps.

1.2.1: CLDR is managed by the Unicode Consortium, but is not part of the Unicode Standard.

2.2 “consonant as per Unicode's definition”: Where does the Unicode standard define this set? Do you mean the Consonant category defined in the IndicSyllabicCategory.txt file? If so, how do the other consonant categories fit in, in which some of the Indic scripts other than Devanagari have characters, e.g., Consonant_Dead?

3. “All the context of word boundaries disscused above should based on tailored Grapheme Cluster Boundaries”: For languages that use spaces to separate words, word breaks are commonly defined by spaces and punctuation (http://www.unicode.org/reports/tr29/#Word_Boundaries). Is that not appropriate for Indic languages, so that tailored grapheme clusters should be used to define words?

3. “tailored Grapheme Cluster Boundaries”: Are extended (non-tailored) grapheme clusters (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) not adequate for Indic writing systems? If so, it would help to have definitions of the required tailored grapheme clusters in CLDR.

3. “ Possible Extension for handling some cases Mouse Selection: At Indic syllable and code point level”: This deserves more detail – should selections be extended by syllable or by code point? Should text be deleted by syllable or by code point or by another unit? When truncating text, should text be truncated by syllables or by code points or by other units?

3. (twice), 4.1, 5.1, 5.2 “See section 5”: It seems that’s now section 2.

4.1 “words आकर्षण and विज्ञापन not follow Indic syllable definition”: Not sure what’s meant by this: They don’t follow the definition because there’s a bug, or they don’t follow the definition because they are exceptions? What is the correct behavior?

5.2: Is letter spacing commonly used with Indian languages? Is it used more for emphasis or for justification?

5.3: Figure 6 is missing.

5.4: Collation doesn’t need to be discussed in a document on layout requirements.

A. The normative references to UAX 14 and UAX 29 refer to specific and rather old versions of these annexes. Is that intentional?

Best regards,
Norbert

Received on Tuesday, 2 December 2014 02:11:07 UTC