- From: fantasai <fantasai.lists@inkedblade.net>
- Date: Wed, 25 Jun 2014 08:11:03 -0700
- To: Richard Ishida <ishida@w3.org>, www-international@w3.org
- CC: "CSS WWW Style (www-style@w3.org)" <www-style@w3.org>
On 05/22/2014 11:09 AM, Richard Ishida wrote: > > One is that, as I mentioned already, it is not correct to say 'the "user-perceived character", also know as the grapheme > cluster.' The equivalent term for a user-percieved character is 'grapheme'. The 'grapheme cluster' is a unit derived from > rules in Unicode to yield an *approximation* to a user-defined character. Not all user-perceived characters are grapheme > clusters. I'm fine to remove that phrase if it's problematic. Is it problematic in UAX29 also? (Does it need a bug filed there?) > Another is a worry whether we can really effectively split > the world into semantically-perceived and visually-perceived > characters - especially given the 'etc' that appears in the > definition where we list appropriate operations for each. > For example, are we sure that first-letter operations require > semantically- rather than visually-perceived characters in all > cases? Where does cursor movement fit here? etc. I think I have to conclude that no, we can't. > What about Arabic justification which may involve increasing > word -internal 'gaps' that occur due to one glyph not joining > with the following glyph. These are relevant units for > justification of Arabic text, but they aren't user-perceived > characters. Is that really a relevant concept? Increasing word-internal 'gaps' is a horrible way to justify Arabic text, look: http://dev.w3.org/csswg/css-text/arabic-stretch-unjoined It results in uneven typographic color and obscures word boundaries. It might exist, but I've never seen it... > And what about the case where Indic script text units vary > according to the font in use. As I understand it, a text > unit for wrapping or stretching in Devanagari can encompass > a CvCVD (consonant, virama, consonant, vowel sign, diacritic) > only if the font has glyphs to show this is a single visual > unit (eg. ligatures, half-forms, special glyphs) and hides > the virama. If the font is changed, such that the virama > becomes visible, we are now dealing with two text units. > This font-specific behaviour for the same sequence of code > points is a contextual difference that, I think, cuts across > both the semantic- and visual- categories currently defined. Okay. > I think that actually all we may be trying to say is that > the atomic unit of text for a particular operation may not > be the same as for another, but that we start from a base > of grapheme clusters and require the application to take > into account variances and extensions of that as needed. > What if we simply talk in terms of vague 'typographic units', > or 'text units', or some such, but describe up front how > these can be different sequences of code points depending > on the operation to be performed (ie. not try to define > just two specific scenarios)? Overall, I agree with the concept, but I want to make sure that the spec is somehow understandable to people who are not either a) members of the i18nWG or a similar community b) text layout implementation experts (If Lea Verou cannot make sense of the CSS Text spec well enough to use it as a reference for the properties it defines, then I consider the spec to be a failure.) I've reworked the Terminology section following your suggestions: will work on the rest of the spec tomorrow and hopefully have it all make sense soon. ^_^ ~fantasai
Received on Wednesday, 25 June 2014 15:11:37 UTC