W3C home > Mailing lists > Public > public-i18n-indic@w3.org > July to September 2015

[ilreq] i18n-ISSUE-407: Clarification of initial letter example

From: r12a via GitHub <sysbot+gh@w3.org>
Date: Wed, 16 Sep 2015 07:09:03 +0000
To: public-i18n-indic@w3.org
Message-ID: <issues.opened-106713351-1442387343-sysbot+gh@w3.org>
r12a has just created a new issue for https://github.com/w3c/ilreq:

== i18n-ISSUE-407: Clarification of initial letter example ==
[moved here from tracker]

5.1 First Letter
http://www.w3.org/TR/2014/WD-ilreq-20141216/#first-letter

"Note how the vowel sign appears to the left of the first character, 
not the third. There are three grapheme clusters here. The first 
includes the SA+VIRAMA,THA+I and T+II. We see that the styling is done
 on the basis of the syllable, not the first character. A syllable 
includes a base consonant and any combination of the following 
characters in the text stream:"

This text is misleading when paired with figure 4 when it talks about 
3 graphemes and there are 3 red circles. It also doesn't show first 
letter styling, as the text says, which is confusing. There is also an
 error in the romanization.

How about the following wording, based around the example at 
https://www.flickr.com/photos/ishida/16084553630/
I also suggest renaming the section to Initial Letter Styling, to 
match the CSS Inline spec


> Indic script behavior in initial letter styling is based on 
syllables, rather than individual letter forms.
> 
> Figure 4 shows an example of a drop intial in Hindi. In the first 
word of the paragraph, स्कूल ('skūl'), 
> the sequence of characters is stored in memory is as follows:
> 
> स ‎U+0938 DEVANAGARI LETTER SA
> ् ‎U+094D DEVANAGARI SIGN VIRAMA
> क ‎U+0915 DEVANAGARI LETTER KA
> ू ‎U+0942 DEVANAGARI VOWEL SIGN UU
> ल ‎U+0932 DEVANAGARI LETTER LA
> 
> There are two syllables in this word: SA+VIRAMA+KA+UU and LA. Note, 
however, that there are three Unicode grapheme clusters here: 
SA+VIRAMA, KA+UU and LA.
> 
> Styling is done on the basis of the whole orthographic syllable, not
 the first character, nor even the first grapheme.
> 
> A syllable includes a base consonant and any combination of the 
following characters in the text stream:
> - sequences of consonants preceded by virama (i.e. conjuncts).
> - vowel signs
> - visarga, anusvara or candrabindu.
> 
> 
> NOTE: The detailed definition of Indic syllables is given in section
 2.
> 
> Here are some further examples of initial letter styling based on 
the Indic syllable definition.
> 
> ...


An alternative would be to take the above text and put it at the 
bottom of section 3 Text Segmentation, as an illustration of the point
 made in the last paragraph ("text segmentation should be done as 
Indic syllable"). This is useful because it clearly distinguishes 
between grapheme cluster and syllabic units, and could be referred to 
from other sections, too, such as the section on vertical text.

And then simply say, at the start of section 5.1 that selection of 
initial letters uses the orthographic syllable as the unit, as 
illustrated in section 2, and then simply give some examples. The 
majority of section 5.1 could then focus on more specific 
requirements, such as what styles of highlighting are common, and what
 the alignment points, etc, are.

See https://github.com/w3c/ilreq/issues/2
Further comments on this issue will NOT be notified to this list. If 
you'd like to follow the discussion, please do so by subscribing to 
the issue via the above link. Do not reply to this email.
Received on Wednesday, 16 September 2015 07:09:06 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 16 September 2015 07:09:06 UTC