W3C home > Mailing lists > Public > public-i18n-core@w3.org > April to June 2011

RE: [css3-text] script categories, 'bicameral', 'discrete', Unicode links and more

From: CE Whitehead <cewcathar@hotmail.com>
Date: Sat, 16 Apr 2011 14:44:37 -0400
Message-ID: <SNT142-w208027E48785A22CCD9E5CB3AF0@phx.gbl>
To: <tiro@tiro.com>, <xn--mlform-iua@xn--mlform-iua.no>
CC: <asmusf@ix.netcom.com>, <fantasai.lists@inkedblade.net>, <cowan@mercury.ccil.org>, <www-international@w3.org>, <public-i18n-core@w3.org>, <public-i18n-indic@w3.org>, <public-i18n-cjk@w3.org>, <www-style@w3.org>

Hi I tend to think some font behavior may be more language specific than script specific (however, I have seen spacing used for emphasis in English, at least in the past, so it may not be just German).
 
Here are some resources.  I hope that at least one is useful. 
 
The following unicode conference report discusses the kashida (stretching) in Arabic
http://www.scribd.com/doc/238961/Authentic-Arabic-Typography-Technical-and-Aesthetic-Challenges
"Text block justification in Arabic involves more than in Latin script. The latter has two distinct mechanisms for justification: aglobal one, which indiscriminately inserts micro-spaces and a specific one to hyphenate words according to elaborate rules that vary from language to language. Islamic calligraphy has a device called thekeshideh, a Persian and Ottoman Turkish term meaning
'stretching'.Keshideh is typeface-dependent, as the hyphen is language-dependent. That is, to get
aesthetically acceptable results, akeshideh is placed according to a complex set of rules giving
priority to certain letter combinations over others. These rules vary between calligraphic styles. The
result is characteristically different for each kind of Arabic script. In other words, thekeshideh is the
equivalent of hyphenation and not of micro-justification."
(I have seen the kashida used to distinguish repeated characters at the end of a word such as in words ending with the suffix,
"iyyin" -- for example, "misrayyiin" 'Egyptian' from "misra" 'Egypt' + suffix; I haved not paid much attention to formatting myself however I have seen heaping at least on my own; I would not notice that stretching was being used for text justification so it is probably more common than word heaping)
http://www.idnforums.com/forums/1031-watch-out-there-is-arabic-hyphen.html

This Egyptian speaker says that there is an Arabic hyphen although I gather it must be rare.  It is slightly curved -- but I cannot find an example that I can see.

 
http://www.tug.org/TUGboat/tb27-2/tb87benatia.pdf

Arabic text justification
Mohamed Jamal Eddine Benatia,
Mohamed Elyaakoubi and Azzeddine Lazrek
Department of Computer Science,
Faculty of Science, University Cadi Ayyad
P.O. Box 2390, Marrakesh, Morocco
lazrek (at) ucam dot ac dot ma

Here it is again:
http://www.ucam.ac.ma/fssm/rydarab
Arabic Text Justification
M. J. E. Benaatia, M. Elyaakoubi and A. Lazrek
(Department of Computer Science 
Cadi Ayyad University, Marrakesh)

"Calligraphers also build on other practices for justification, such as:
word heaping: putting certain words above others .
moving the broken fragment above the hyphenated word .
word hyphenation .
word hyphenation in margin .
decreasing of some words at the end of a line . .
curving of the baseline ."

(Authors give details about the use of the kashida)
The authors mention moving the hyphenated fragment to the margin for the Holy Q'uran and show an example; this was discussed previously I think on some list; I still am unable to see the hyphens.
I'm attaching one image of word heaping and hyphenation)

 

http://books.google.com/books?id=mONRrqREIAAC&pg=PR8&lpg=PR8&dq=Arabic+hyphen&source=bl&ots=bfQg-0ldLB&sig=lMqm1bAN22TKFjUBrTF4lt38Xnw&hl=fr&ei=Bf-oTaHUIMfniALL28TvDA&sa=X&oi=book_result&ct=result&resnum=10&ved=0CFcQ6AEwCTgU#v=onepage&q=Arabic%20hyphen&f=false
A Dictionary of Post-classical Yemeni Arabic Vol 1
Explains some hyphenation formats peculiar to Hebrew script.  (apostrophe-like form is used)

 
A more useless resource but this backs up what you all say -- that hyphenation is possible in Arabic script though not really for the Arabic language:

http://omega.enstb.org/yannis/pdf/marrakech.pdf
"• if they are given by ligature nodes, once again
we have two possibilities: when the ligature is
not broken then we have a single static glyph.
When the ligature is broken we return to “characters”
(or at least to something which is a bit
closer to the concept of character, even though
it is not exactly a character) and apply the main
loop again to the two parts (before and after
the break), which sometimes results in new ligatures.
But once again each node list obtained
that way is unique."
"Dynamic typesetting is a method of typesetting
where glyphs can change during the process of line
breaking, for reasons which may depend on macrotypographic
properties such as justification of the line
or of the entire paragraph, or more global phenomena
like glyphs on subsequent lines touching each
other or to avoid rivers, etc
The keshideh is a curved pen stroke
of definite length that slightly
stretches a letter-compound. The
illustration shows 3 measures of
keshideh commonly used in Naskh).
Within a word, or rather letter-
compound, usually no more than
one such keshideh occurs. Some
letters produce their own
prolonged forms, in which case . . . ruled out "

"Step 1: Hyphenation It has been said over and
over again that Arabic is not hyphenated. This is
true when we refer to Arabic language, but false
when we refer to Arabic script. Indeed, there is one
language written in Arabic script, namely Uighur,
which uses hyphenation just like any European language.
Uighur may use the Arabic script but is
. . . {IMG}
not a Semitic language and hence does not use implicit
short vowels: all vowels are explicitly written
and one can easily identify syllables and hyphenate
words between them.2
Uighur is indeed hyphenated but if we add soft
hyphen characters we risk obstruction of contextual
analysis, which is the next step. It is easier to add
potential breakpoints as texteme properties, as is
done for other languages."

 
(I am still reading the report and should have some comments on that soon.)
 
Best, 
 
C. E. Whitehead
cewcathar@hotmail.com 
 		 	   		  
Received on Saturday, 16 April 2011 18:51:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 16 April 2011 18:51:54 GMT