W3C home > Mailing lists > Public > www-style@w3.org > October 2010

Re: css3-text- Indic Inputs

From: 신정식 <jshin1987@gmail.com>
Date: Sun, 10 Oct 2010 22:22:23 -0700
Message-ID: <AANLkTi=F7cMSEhZUya=ckvT2bJvRgny7d6pfTNeAPB=F@mail.gmail.com>
To: Ed <ed.trager@gmail.com>
Cc: fantasai <fantasai.lists@inkedblade.net>, Cibu Johny <cibu@google.com>, Somnath Chandra <schandra@mit.gov.in>, style <www-style@w3.org>, wwwintl <www-international@w3.org>, intlcore <public-i18n-core@w3.org>, indic <public-i18n-indic@w3.org>, Richard Ishida <ishida@w3.org>, Andrew Cunningham <lang.support@gmail.com>
On Sat, Oct 9, 2010 at 3:55 AM, Ed <ed.trager@gmail.com> wrote:

> I agree with Andrew: there needs to be wording making it clear that
> the first-letter pseudo-element applies to the first grapheme cluster.
>

I fully agree.


>  This will be true not only for the Indic scripts, but also for
> Indic-derived scripts of Southeast Asia like Thai, Laos, Myanmar,
> Khmer, and Tai Tham, inter alia.
>

Well, it's not limited to South and SE Asian scripts. Even
Latin/Cyrillic/Greek and Korean scripts need the same treatment either when
decomposed forms are used (although W3C CHARMOD assumes NFC, there's nothing
to prevent web authors from using decomposed forms) or characters/letters in
question can only be represented with multiple unicode characters (usually
base + diacritics, but not always as is the case of archaic Korean).  And,
it also has to be applied to Hebrew, Arabic, Syriac and Thaana.

As for Indic scripts, we need to agree on what makes up a grapheme cluster
(when implementing 'first-letter'). Below is what UAX #29 has to say about
that:

Grapheme clusters can be tailored to meet further requirements. Such
tailoring is permitted, but the possible rules are outside of the scope of
this document. One example of such a tailoring would be for the *aksaras*,
or *orthographic syllables*, used in many Indic scripts. Aksaras usually
consist of a consonant, sometimes with an inherent vowel and sometimes
followed by an explicit, dependent vowel whose rendering may end up on any
side of the consonant letter base. Extended grapheme clusters include such
simple combinations.

However, aksaras may also include one or more additional prefixed
consonants, typically with a *virama* (halant) character between each
consonant in the sequence. Such consonant cluster aksaras are not
incorporated into the default rules for extended grapheme clusters, in part
because not all such sequences are considered to be single "characters" by
users. Indic scripts vary considerably in how they handle the rendering of
such aksaras—in some cases stacking them up into combined forms known as
consonant conjuncts, and in other cases stringing them out horizontally,
with visible renditions of the halant on each consonant in the sequence.
There is even greater variability in how the typical liquid consonants (or
"medials"), *ya, ra, la,* and *wa*, are handled for display in combinations
in aksaras. So tailorings for aksaras may need to be script-, language-,
font-, or context-specific to be useful.

*Note: Font-based information may be required to determine the appropriate
unit to use for UI purposes, such as identification of boundaries for
first-letter paragraph styling. For example, such a unit could be a ligature
formed of two grapheme clusters, such as لا (Arabic *

The Unicode definitions of grapheme clusters are defaults: not meant to
exclude the use of more sophisticated definitions of tailored grapheme
clusters where appropriate. Such definitions may more precisely match the
user expectations within individual languages for given processes. For
example, “ch” may be considered a grapheme cluster in Slovak, for processes
such as collation. The default definitions are, however, designed to provide
a much more accurate match to overall user expectations for what the user
perceives of as *characters* than is provided by individual Unicode code
points.

Jungshik




> It will still be a very long time before browsers actually provide
> adequate support: but in any case it will be very nice if the specs
> have adequate wording so implementors will have a better clue about
> what might actually be required to support complex scripts.
>
> On Sat, Oct 9, 2010 at 3:47 AM, Andrew Cunningham <andrewc@vicnet.net.au>
> wrote:
> >
> > On Sat, October 9, 2010 13:11, fantasai wrote:
> > > On 10/08/2010 10:26 AM, Cibu Johny (സിബൠ) wrote:
> >
> > >
> > > No part of the document gives me enough information to implement
> > > any of
> > >    - first-letter
> >
> > this may be a browser bug issue as well.
> >
> > CSS3 Selectors module has a note that ::first-letter pseudo-element
> should
> > at least apply to the default grapheme cluster.
> >
> > Maybe rather than as a note , this might be included using stronger
> wording?
> >
> >
> >
> > --
> > Andrew Cunningham
> > Research and Development Coordinator
> > Vicnet
> > State Library of Victoria
> > Australia
> >
> > andrewc@vicnet.net.au
> >
> >
>
>
Received on Monday, 11 October 2010 06:48:09 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:32 GMT