- From: Andrew Cunningham <acunningham@slv.vic.gov.au>
- Date: Thu, 11 Jul 2013 08:08:22 +1000
- To: John Hudson <tiro@tiro.com>
- Cc: public-i18n-indic@w3.org
- Message-ID: <CAOUP6K=HUF5DQ2k5YN6qg5J8urSPOZPOHft2d_-_by073kzSFA@mail.gmail.com>
Sounds very promissing, was playing arojng in JavaScript looking at ways of doing it, but had to look at it language by language in Myanmar script, since codepoints that are part of a syllable in Burmese would be a separate syllable in its own write in S'gaw Karen. On 11/07/2013 6:53 AM, "John Hudson" <tiro@tiro.com> wrote: > On 10/07/13 1:26 PM, Andrew Cunningham wrote: > > If and when we get to SE Asian languages, Burmese will throw a spanner >> in the works. >> > > In what sense? > > A few of us have been discussing first-letter, and current concensus is >> it should match first orthographic syllable which can be more than one >> grapheme cluster. >> > > Right, but the orthographic syllable is also the core unit of script > processing for Indic and Southeast Asian script layout. Hence, the 'Basic > shaping forms' layout features in each of Microsoft's font specifications > for these scripts are applied at the orthographic syllable level, which > means the layout engine is first applying algorithms to delimit the > orthographic syllables in a word, based on patterns of characters. We do > similar analysis when doing conjunct frequency analysis on text corpora. It > seems to me that the same approach could be taken to select orthographic > syllables as units in CSS. > > JH > >
Received on Wednesday, 10 July 2013 22:08:50 UTC