- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Tue, 24 Jun 2014 17:23:57 +0900
- To: Somnath Chandra <schandra@deity.gov.in>, indic <public-i18n-indic@w3.org>
- CC: slata <slata@mit.gov.in>, Manoj Jain <mjain@deity.gov.in>, prashant verma <vermaprashant1@gmail.com>
Hello Somnath, On 2014/06/24 13:47, Somnath Chandra wrote: > Dear All, > > Pl find the revised definition of Indic Syllable as per the appended mail , which has been circulated on June 17, 2014. The definition is generic in nature to suit most of Indian Languages [11 languages tested]. Pl send your feedback towards finalization. > > With regards, > Somnath > > -------- Original Message -------- > From: Swaran Lata <slata@deity.gov.in> > Date: Jun 17, 2014 5:20:20 PM > Subject: ABNF defintion of Indic syllable > To: public-i18n-indic@w3.org > Cc: Somnath Chandra <schandra@mit.gov.in>, Manoj Jain <mjain@mit.gov.in> > > > Dear All, > > > The definition of Indic syllable has been revised as under : > > V[m] |{CH}C[v][m]|CH > > > > > The Linguistic definition of Indic syllable has been mapped to ABNF(Augmented Backus–Naur Form) for the purpose of text segmentation, Line breaking , Drop letter, letter spacing in horizontal text and vertical text representation. The definition has been elaborated taking Hindi as an example. > > > > > The definition is combination of 3 rules : > > > > > Rule 1 : V[m] > > Rule 2 : {CH}C[v][m] > > Rule 3 : CH (This rule is applicable only at the end of the word) In European languages, as far as I know, a final consonant would be considered part of the preceding syllable, not a syllable on its own. As an example, "cat" would be considered as one syllable, not two syllables ("ca" and "t"). Would one really put a word-final consonant on a new line in Indic languages? Just wondering. Regards, Martin. > V(Upper case) is complete vowel > > m is modifier(Anusvara/Visarga/Chandrabindu) > > C is Consonant as per Unicode definition which may or may not include nukta > > v (lower case) is any dependent vowel or vowel sign (mātrā) > > H is halant / virama > > | is a rule seperator > > [ ] - The enclosed items is optional under this bracket > > {} - The enclosed item/items occurs once or repeated multiple times > > > > > Examples: > > Rule 1 : V[m] > > > > Sl. No. > > Examples > > Definition > > 1. > > अ, ई, उ > > V (Vowel) is a syllable > > > > 2. > > अं, उँ, आः > > V+ Modifier is a syllable > > > > > > > Rule 2 : {CH}C[v][m] > > > > Sl. No. > > Examples > > Definition > > 1. > > र, क, ज, ल, म > > Consonant is a syllable > > 2. > > प्प,क्ख,च्त, ज्ज्व, त्क्ल,त्स्न > > > > > > Zero or more Consonant + Virama sequences followed by consonant is a syllable > > > > 3. > > र्त, र्त्स, र्त्स्न, र्त्स्न्य, फ़्क़ > > Zero or more Consonant (Nukta) +Virama followed by consonant is a syllable > > > > 4. > > र्ता, र्त्स्न्या, फ़्जी, क्या > > Zero or more consonant+ (Nukta)+ virāma sequences followed by a consonant (+Nukta) followed by a vowel sign is a syllable > > 5. > > तः,स्तं, स्त्रँ, स्तः, फ़्ज़ँ > > > > > > > > zero or more consonant+ (Nukta)+ virāma sequences followed by a consonant (+Nukta) followed by modifier is a syllable > > 6. > > र्त्स्न्या: त्स्न्युं, त्स्न्युँ, फ़्ज़ें,हिं > > zero or more consonant+ (Nukta)+ virāma sequences followed by a consonant (+Nukta) followed by a vowel sign and modifier is a syllable > > 7. > > स्थि,ज्जि,ख्वा > > Zero or more Consonant +halant sequences followed by a consonant followed by vowel sign is a syllable > > > > > Rule 3 : CH > > त् , व् , म् , भ् etc are syllable in Hindi only at the end of the word > > Examples of combination of the rules : > > 1. स्वागतम् - CHCv + C + C + CH has following syllables : > > > > स्वा > > CHCv > > ग > > C > > त > > C > > म् > > CH > > > > > 2. भरतनाट्यम- C + C + C + Cv + CHC + C > > > > भ > > C > > र > > C > > त > > C > > ना > > Cv > > ट्य > > CHC > > म > > C > > > > > 3. सद्बुद्धि - C + CHCv + CHCv > > > > स > > C > > द्बु > > CHCv > > द्धि > > CHCv > > > > > The proposed definition is generic in nature and has already being tested for 11 Indian languages i.e Hindi, Marathi, Bengali, Nepali, Tamil, Telugu, Kannada, Gujarati, Punjabi, Oriya & Malayalam. The new rule for CH(Consonant+ Halant) occurrence at the end of the word has been introduced. The link of the test suite is available at http://w3cindia.in/syllable-generator.aspx.The testing of the remaining languages is underway. > > I request you to kindly give your valuable feedback. > > > > > regards, >
Received on Tuesday, 24 June 2014 08:24:48 UTC