Re: Comments in Indic Layout Requirements from Somnath Chandra on 2014-12-11 (public-i18n-indic@w3.org from October to December 2014)

From: Somnath Chandra <schandra@deity.gov.in>
Date: Thu, 11 Dec 2014 17:02:19 +0530
To: Norbert Lindenberg <w3@lindenbergsoftware.com>
Cc: public-i18n-indic <public-i18n-indic@w3.org>, slata <slata@mit.gov.in>, prashant verma <vermaprashant1@gmail.com>
Message-id: <fb129b291078a.5489ce1b@nic.in>
Dear Norbert,

Thanks a lot for your valuable contribution on Indic layout document. We agree , that 3 languages mentioned in the section 1.1 are written in the Perso-Arabic script and we shall definitely address the layout issues pertaining to these languages. We are already in collaboration with National Standards bodies in these languages to address the specific issues in these languages.

However , let us first complete the layout requirements for the set of languages [Hindi, Marathi, Punjabi, Bengali, Telugu] as mentioned in the charter [http://www.w3.org/2012/07/indic-tf-charter/charter.html]

The publication of FPWD for Hindi is an important step towards detailed review , which will help us in developing the requirements for other Devanagari based languages.

We have already initiated work on other scripts such as Bengali , Telugu , and Punjabi. 

As suggested we shall take up other items as we receive further feedback.

With best regards,
Somnath


On 12/11/14 10:18 AM, Norbert Lindenberg  <w3@lindenbergsoftware.com> wrote:
> 
> Dear Somnath,
> 
> Thank you very much for your reply! A few comments on your reply and today’s updated version below. I think the first one should be addressed for the FPWD to set expectations correctly for other reviewers; other items can be addressed over time.
> 
> Best regards,
> Norbert
> 
> 
> > On Dec 8, 2014, at 2:23 , Somnath Chandra <schandra@deity.gov.in> wrote:
> > 
> >> The document should clarify which writing systems it intends to cover. Three languages are mentioned in section 1.1 as being written primarily or alternatively in Perso-Arabic script, but otherwise there’s no information regarding writing in this script and issues associated with it, such as bidirectional layout or justification using kashidas. Should this be covered here, or is the hope that it will eventually be covered in a requirements document for Arabic and other bidirectional scripts?
> > 
> > The other Indian languages will be covered in the subsequent versions of the including perso-arabic.
> 
> That’s good to know; stating the intended scope in the document itself (e.g., in the Status section) would help make that clear to other reviewers.
> 
> >> 2.2 “consonant as per Unicode's definition”: Where does the Unicode standard define this set? Do you mean the Consonant category defined in the IndicSyllabicCategory.txt file? If so, how do the other consonant categories fit in, in which some of the Indic scripts other than Devanagari have characters, e.g., Consonant_Dead?
> > 
> > C is a consonant which may or may not include nukta.
> 
> So a “consonant” may actually consist of multiple code points. The document should clarify which code point sequences are allowed.
> 
> >> 3. “All the context of word boundaries discussed above should based on tailored Grapheme Cluster Boundaries”: For languages that use spaces to separate words, word breaks are commonly defined by spaces and punctuation (http://www.unicode.org/reports/tr29/#Word_Boundaries). Is that not appropriate for Indic languages, so that tailored grapheme clusters should be used to define words?
> >> 
> >> 3. “tailored Grapheme Cluster Boundaries”: Are extended (non-tailored) grapheme clusters (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) not adequate for Indic writing systems? If so, it would help to have definitions of the required tailored grapheme clusters in CLDR.
> > 
> > Extended grapheme clusters are only combine characters with dependent vowel sign, but tailored grapheme cluster includes one or more additional prefixed consonants, typically with a virama (halant) character between each pair of consonants in the sequence. So tailored grapheme cluster is more suitable for Indic scripts.
> 
> I can see that tailored grapheme clusters may be appropriate for user-perceived character boundaries. It’s still not clear why they’re necessary to define word boundaries – are there cases where words are not separated by spaces or punctuation?
> 
> > 5.2: Is letter spacing commonly used with Indian languages? Is it used more for emphasis or for justification?
> > 
> > It is sometime used in banners and newspapers.
> 
> I think that would be useful to state in the document.
> 
> 
-- 

Dr. Somnath Chandra
Scientist-E
Dept. of Electronics & Information Technology
Ministry of Communications & Information Technology
Govt. of India
Tel:+91-11-24364744,24301856
Fax: +91-11-24363099
e-mail :schandra@mit.gov.in
Received on Thursday, 11 December 2014 11:34:09 UTC