Re: Comments in Indic Layout Requirements from Norbert Lindenberg on 2014-12-11 (public-i18n-indic@w3.org from October to December 2014)

From: Norbert Lindenberg <w3@lindenbergsoftware.com>
Date: Wed, 10 Dec 2014 20:48:22 -0800
To: Somnath Chandra <schandra@deity.gov.in>
Cc: Norbert Lindenberg <w3@lindenbergsoftware.com>, public-i18n-indic <public-i18n-indic@w3.org>, slata <slata@mit.gov.in>, prashant verma <vermaprashant1@gmail.com>
Message-Id: <AD035329-B277-4142-9956-10599ADFF5C8@lindenbergsoftware.com>

Dear Somnath,

Thank you very much for your reply! A few comments on your reply and today’s updated version below. I think the first one should be addressed for the FPWD to set expectations correctly for other reviewers; other items can be addressed over time.

Best regards,
Norbert


> On Dec 8, 2014, at 2:23 , Somnath Chandra <schandra@deity.gov.in> wrote:
> 
>> The document should clarify which writing systems it intends to cover. Three languages are mentioned in section 1.1 as being written primarily or alternatively in Perso-Arabic script, but otherwise there’s no information regarding writing in this script and issues associated with it, such as bidirectional layout or justification using kashidas. Should this be covered here, or is the hope that it will eventually be covered in a requirements document for Arabic and other bidirectional scripts?
> 
> The other Indian languages will be covered in the subsequent versions of the including perso-arabic.

That’s good to know; stating the intended scope in the document itself (e.g., in the Status section) would help make that clear to other reviewers.

>> 2.2 “consonant as per Unicode's definition”: Where does the Unicode standard define this set? Do you mean the Consonant category defined in the IndicSyllabicCategory.txt file? If so, how do the other consonant categories fit in, in which some of the Indic scripts other than Devanagari have characters, e.g., Consonant_Dead?
> 
> C  is a consonant which may or may not include nukta.

So a “consonant” may actually consist of multiple code points. The document should clarify which code point sequences are allowed.

>> 3. “All the context of word boundaries discussed above should based on tailored Grapheme Cluster Boundaries”: For languages that use spaces to separate words, word breaks are commonly defined by spaces and punctuation (http://www.unicode.org/reports/tr29/#Word_Boundaries). Is that not appropriate for Indic languages, so that tailored grapheme clusters should be used to define words?
>> 
>> 3. “tailored Grapheme Cluster Boundaries”: Are extended (non-tailored) grapheme clusters (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) not adequate for Indic writing systems? If so, it would help to have definitions of the required tailored grapheme clusters in CLDR.
> 
> Extended grapheme clusters are only combine characters with dependent vowel sign, but tailored grapheme cluster includes one or more additional prefixed consonants, typically with a virama (halant) character between each pair of consonants in the sequence. So tailored grapheme cluster is more suitable for Indic scripts.

I can see that tailored grapheme clusters may be appropriate for user-perceived character boundaries. It’s still not clear why they’re necessary to define word boundaries – are there cases where words are not separated by spaces or punctuation?

> 5.2: Is letter spacing commonly used with Indian languages? Is it used more for emphasis or for justification?
> 
> It is sometime used in banners and newspapers.

I think that would be useful to state in the document.

Received on Thursday, 11 December 2014 04:49:19 UTC