Re: [css3-text] script categories, 'bicameral', 'discrete', Unicode links and more from John Hudson on 2011-04-15 (www-style@w3.org from April 2011)

From: John Hudson <tiro@tiro.com>
Date: Fri, 15 Apr 2011 11:29:03 -0700
CC: fantasai <fantasai.lists@inkedblade.net>, www-style@w3.org
Message-ID: <4DA88E6F.9080101@tiro.com>

What is the functional purpose of the script classification system? If 
it is to be used primarily in line-breaking and justification behaviour, 
perhaps it would make more sense to explicitly refer to this section as 
something like 'Line-break Categorisation' and to label the categories 
with regard to line-break and justification models, rather than by 
trying to label scripts by type and then presume a line-breaking model 
for each type.

As it stands, the proposed classification criteria seems confused and to 
be based on an idiosyncratic analysis that ends up forcing closely 
related writing systems into different categories; there may be good 
reason for these divisions based on line-breaking needs, but for anyone 
familiar with more typical script analysis the use of familiar terms in 
strange ways is confusing, as are the implied groupings. For instance, 
under the categorisation criteria, Devanagari and Bengali would be 
considered 'connected scripts', while Gujarati and Oriya would be 
'discrete scripts', despite that fact that all four scripts are closely 
related, have historically been analysed as local variants of the same 
writing system, and share important features that are ignored by the 
proposed classification criteria.

The term cursive is problematic because virtually any writing system can 
and has been written in a cursive form, even nominal 'block scripts'. 
There are plentiful examples of cursive Latin script, and in many 
instances these are analysable as being at the same time cursive and 
discrete, since the letters within words retain their discrete isolated 
shapes are are linked by joining strokes that are not part of the 
letter. This in contrast to Arabic, in which the joining strokes are 
part of the letters, replacing other strokes that occur in the isolated 
forms. So the distinction between Latin and Arabic is that the latter is 
morphographical, while both may be written in cursive styles. [This also 
raises the issue of the degree to which nominal script-level decisions 
about line-breaking and justification can be safely applied to 
particular styles and particular fonts. If a justification model permits 
inter-character spacing adjustment of 'discrete' scripts, what is the 
effect on cursive font styles?]

Wouldn't it be better to define a set of line-breaking and justification 
model categories, and then populate these with the scripts to which each 
should be applied?

JH

Received on Friday, 15 April 2011 18:29:40 UTC