Re: Comments in Indic Layout Requirements from Somnath Chandra on 2014-12-08 (public-i18n-indic@w3.org from October to December 2014)

From: Somnath Chandra <schandra@deity.gov.in>
Date: Mon, 08 Dec 2014 15:53:54 +0530
To: Norbert Lindenberg <w3@lindenbergsoftware.com>, public-i18n-indic <public-i18n-indic@w3.org>
Cc: slata <slata@mit.gov.in>, prashant verma <vermaprashant1@gmail.com>
Message-id: <fc34dad863db.5485c992@nic.in>
Hello Norbert,


Thanks for your valuable feedback. We are in process of updating the document and it is being sent to Richard for uploading. 


Our feedback on your specific observations are as follows:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Thedocument should clarify which writing systems it intends to cover. Threelanguages are mentioned in section 1.1 as being written primarily oralternatively in Perso-Arabic script, but otherwise there’s no informationregarding writing in this script and issues associated with it, such asbidirectional layout or justification using kashidas. Should this be coveredhere, or is the hope that it will eventually be covered in a requirementsdocument for Arabic and other bidirectional scripts?

 

Theother Indian languages will be covered in the subsequent versions of theincluding perso-arabic.


– It might be desirable to include a few other Brahmi-derived writing systemsthat are used in countries near India, such as Sinhalese, Nepalese, andTibetan, since chances of them getting their own requirements document arepretty slim. South-east Asian scripts may eventually get covered by their ownrequirements document.




Indiclayout doc at present will only cover Scheduled Indian Languages Indianlanguages (22) and other Brahmi derived languages will might be covered byother task forces like Tibetan Will covered by Chinese layout TF as discussedin the Internationalization teleconference.


– Most of the content of the document seems focused on Hindi written in Devanagari.This seems to imply that all other Indian writing systems work pretty much thesame as this one. I’m not an expert, but I’ve heard claims that theresignificant differences between Indian writing systems, especially betweensouthern ones such as Tamil and Malayalam and the northern ones. If that’s thecase, such differences should be documented.




Indicsyllable definition covered most of the Indian languages including south Indianlanguages. The orthographic variationsof languages are few. Only Malayalam has special characters like Chillakshram.All remaining Indian languages will covered in the subsequent versions.


– The document is a lot shorter than the Japanese layout requirements. Readerswill generally assume that for all aspects of text layout that are not coveredin the document generic/English rules apply: Text emphasis by usingbold/italic/underline, justification by extended spaces, headings in largertype and/or bold, quotations in “” (or « », or other quotation marks), pagenumbers in Western digits, etc. Is that all fine?




Thisis the first stage. The document would be comprehensive , as the other languages are being added in the draft.The publication of FPWD would increase its reach and based on the feedback thedraft would be modified.



1.2.1 “Unicode uses a 16 bit encoding that provides code point for more than 1million characters”: Well, UTF-16 exists, but a 16-bit encoding has not been atthe core of Unicode for a long time. At the core code points between U+0000 andU+10FFFF.



1.2.1“UNICODE”: Unicode is not an acronym and is therefore not normally written inall-caps.




Wewill take care.


1.2.1: CLDR ismanaged by the Unicode Consortium, but is not part of the Unicode Standard.

ChangeDone


2.2 “consonant as per Unicode's definition”: Where does the Unicode standarddefine this set? Do you mean the Consonant category defined in theIndicSyllabicCategory.txt file? If so, how do the other consonant categoriesfit in, in which some of the Indic scripts other than Devanagari havecharacters, e.g., Consonant_Dead?




C is a consonant which may or may not includenukta.

3. “All the context of word boundaries discussed above should based on tailoredGrapheme Cluster Boundaries”: For languages that use spaces to separate words,word breaks are commonly defined by spaces and punctuation 

(http://www.unicode.org/reports/tr29/#Word_Boundaries).Is that not appropriate for Indic languages, so that tailored grapheme clustersshould be used to define words?




3. “tailored Grapheme Cluster Boundaries”: Are extended (non-tailored) graphemeclusters (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)not adequate for Indic writing systems? If so, it would help to havedefinitions of the required tailored grapheme clusters in CLDR.




Extendedgrapheme clusters are only combine characters with dependent vowel sign, buttailored grapheme cluster includes one or more additional prefixed consonants,typically with a virama (halant) character between each pair ofconsonants in the sequence. So tailored grapheme cluster is more suitable forIndic scripts.


3. “Possible Extension for handling some cases Mouse Selection: At Indicsyllable and code point level”: This deserves more detail – should selectionsbe extended by syllable or by code point? Should text be deleted by syllable orby code point or by another unit? When truncating text, should text betruncated by syllables or by code points or by other units?

3. (twice), 4.1, 5.1, 5.2 “See section 5”: It seems that’s now section 2.


Changed .


4.1 “words आकर्षण and विज्ञापन not follow Indic syllable definition”: Not sure what’smeant by this: They don’t follow the definition because there’s a bug, or theydon’t follow the definition because they are exceptions? What is the correct behaviour?




They don'tfollow definition in some of the browsers. This is only example shown aftertesting in the browser. In general , Indic syllable definition follows most of the case

5.2: Is letter spacing commonly used with Indian languages? Is it used more foremphasis or for justification?


It is sometime used in banners and newspapers.

 
5.3: Figure 6 ismissing.

Incorporated.


5.4: Collation doesn’t need to be discussed in a document on layoutrequirements.

A. The normative references to UAX 14 and UAX 29 refer to specific and ratherold versions of these annexes. Is that intentional?


OK, we will make the changes.




We request your kind observations , so that the document may be published as FPWD. 




With best regards,




Somnath







--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


On 12/02/14 07:42 AM, Norbert Lindenberg  <w3@lindenbergsoftware.com> wrote:
> 
> I’ve looked over the 2014-11-27 version of
> http://www.w3.org/International/docs/indic-layout/
> and have some comments:
> 
> – The document should clarify which writing systems it intends to cover. Three languages are mentioned in section 1.1 as being written primarily or alternatively in Perso-Arabic script, but otherwise there’s no information regarding writing in this script and issues associated with it, such as bidirectional layout or justification using kashidas. Should this be covered here, or is the hope that it will eventually be covered in a requirements document for Arabic and other bidirectional scripts?
> 
> – It might be desirable to include a few other Brahmi-derived writing systems that are used in countries near India, such as Sinhalese, Nepalese, and Tibetan, since chances of them getting their own requirements document are pretty slim. South-east Asian scripts may eventually get covered by their own requirements document.
> 
> – Most of the content of the document seems focused on Hindi written in Devanagari. This seems to imply that all other Indian writing systems work pretty much the same as this one. I’m not an expert, but I’ve heard claims that there significant differences between Indian writing systems, especially between southern ones such as Tamil and Malayalam and the northern ones. If that’s the case, such differences should be documented.
> 
> – The document is a lot shorter than the Japanese layout requirements. Readers will generally assume that for all aspects of text layout that are not covered in the document generic/English rules apply: Text emphasis by using bold/italic/underline, justification by extended spaces, headings in larger type and/or bold, quotations in “” (or « », or other quotation marks), page numbers in Western digits, etc. Is that all fine?
> 
> 1.2.1 “Unicode uses a 16 bit encoding that provides code point for more than 1 million characters”: Well, UTF-16 exists, but a 16-bit encoding has not been at the core of Unicode for a long time. At the core are code points between U+0000 and U+10FFFF.
> 
> 1.2.1 “UNICODE”: Unicode is not an acronym and is therefore not normally written in all-caps.
> 
> 1.2.1: CLDR is managed by the Unicode Consortium, but is not part of the Unicode Standard.
> 
> 2.2 “consonant as per Unicode's definition”: Where does the Unicode standard define this set? Do you mean the Consonant category defined in the IndicSyllabicCategory.txt file? If so, how do the other consonant categories fit in, in which some of the Indic scripts other than Devanagari have characters, e.g., Consonant_Dead?
> 
> 3. “All the context of word boundaries disscused above should based on tailored Grapheme Cluster Boundaries”: For languages that use spaces to separate words, word breaks are commonly defined by spaces and punctuation (http://www.unicode.org/reports/tr29/#Word_Boundaries). Is that not appropriate for Indic languages, so that tailored grapheme clusters should be used to define words?
> 
> 3. “tailored Grapheme Cluster Boundaries”: Are extended (non-tailored) grapheme clusters (http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) not adequate for Indic writing systems? If so, it would help to have definitions of the required tailored grapheme clusters in CLDR.
> 
> 3. “ Possible Extension for handling some cases Mouse Selection: At Indic syllable and code point level”: This deserves more detail – should selections be extended by syllable or by code point? Should text be deleted by syllable or by code point or by another unit? When truncating text, should text be truncated by syllables or by code points or by other units?
> 
> 3. (twice), 4.1, 5.1, 5.2 “See section 5”: It seems that’s now section 2.
> 
> 4.1 “words आकर्षण and विज्ञापन not follow Indic syllable definition”: Not sure what’s meant by this: They don’t follow the definition because there’s a bug, or they don’t follow the definition because they are exceptions? What is the correct behavior?
> 
> 5.2: Is letter spacing commonly used with Indian languages? Is it used more for emphasis or for justification?
> 
> 5.3: Figure 6 is missing.
> 
> 5.4: Collation doesn’t need to be discussed in a document on layout requirements.
> 
> A. The normative references to UAX 14 and UAX 29 refer to specific and rather old versions of these annexes. Is that intentional?
> 
> Best regards,
> Norbert
> 
> 
> 
> 
-- 

Dr. Somnath Chandra
Scientist-E
Dept. of Electronics & Information Technology
Ministry of Communications & Information Technology
Govt. of India
Tel:+91-11-24364744,24301856
Fax: +91-11-24363099
e-mail :schandra@mit.gov.in
Received on Monday, 8 December 2014 10:24:34 UTC