W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2015

I18N-ISSUE-407: Clarification of initial letter example [ilreq]

From: Internationalization Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Fri, 16 Jan 2015 18:17:14 +0000
Message-Id: <E1YCBSM-00014s-Bk@shauna.w3.org>
To: public-i18n-core@w3.org
I18N-ISSUE-407: Clarification of initial letter example [ilreq]

http://www.w3.org/International/track/issues/407

Raised by: Richard Ishida
On product: ilreq

5.1 First Letter
http://www.w3.org/TR/2014/WD-ilreq-20141216/#first-letter

"Note how the vowel sign appears to the left of the first character, not the third. There are three grapheme clusters here. The first includes the SA+VIRAMA,THA+I and T+II. We see that the styling is done on the basis of the syllable, not the first character. A syllable includes a base consonant and any combination of the following characters in the text stream:"

This text is misleading when paired with figure 4 when it talks about 3 graphemes and there are 3 red circles. It also doesn't show first letter styling, as the text says, which is confusing. There is also an error in the romanization.

How about the following wording, based around the example at https://www.flickr.com/photos/ishida/16084553630/
I also suggest renaming the section to Initial Letter Styling, to match the CSS Inline spec
---------
Indic script behavior in initial letter styling is based on syllables, rather than individual letter forms. 

Figure 4 shows an example of a drop intial in Hindi. In the first word of the paragraph, स्कूल ('skūl'), the sequence of characters is stored in memory is as follows:

स  ‎U+0938  DEVANAGARI LETTER SA
 ्  ‎U+094D  DEVANAGARI SIGN VIRAMA
क  ‎U+0915  DEVANAGARI LETTER KA
ू  ‎U+0942  DEVANAGARI VOWEL SIGN UU
ल  ‎U+0932  DEVANAGARI LETTER LA

There are two syllables in this word: SA+VIRAMA+KA+UU and LA.  Note, however, that there are three Unicode grapheme clusters here: SA+VIRAMA, KA+UU and LA. 

Styling is done on the basis of the whole orthographic syllable, not the first character, nor even the first grapheme. 

A syllable includes a base consonant and any combination of the following characters in the text stream:
- sequences of consonants preceded by virama (i.e. conjuncts).
- vowel signs
- visarga, anusvara or candrabindu.


NOTE: The detailed definition of Indic syllables is given in section 2.

Here are some further examples of initial letter styling based on the Indic syllable definition.

...
---------

An alternative would be to take the above text and put it at the bottom of section 3 Text Segmentation, as an illustration of the point made in the last paragraph ("text segmentation should be done as Indic syllable").  This is useful because it clearly distinguishes between grapheme cluster and syllabic units, and could be referred to from other sections, too, such as the section on vertical text.

And then simply say, at the start of section 5.1 that selection of initial letters uses the orthographic syllable as the unit, as illustrated in section 2, and then simply give some examples.  The majority of section 5.1 could then focus on more specific requirements, such as what styles of highlighting are common, and what the alignment points, etc, are.
Received on Friday, 16 January 2015 18:17:15 UTC

This archive was generated by hypermail 2.3.1 : Friday, 16 January 2015 18:17:16 UTC