On 12/9/2013 1:48 AM, Somnath Chandra wrote:
> We also need to study the Unicode Line Breaking (UAX #14)and
> Text-Segmentation (UAX#29) algorithms , which may require some
> modifications to suit the Indian Languages requirements.
They certainly do.
They are intentionally designed as default algorithms, that is, the
rules are not based on the language of the text, but if the kind or
regular expression for sylllable structure that you cite from the W3C
document is in fact generic enough to be useful in wide contexts, I
don't see why it wouldn't be possible to lobby Unicode to add it to
these algorithms.
It's easier for UAX#29, because it's the more general algorithm. UAX#14
grew out of work for East Asia, and initially did not assume that
implementations would have the full power of regular expressions.
A./