Re: Follow up on regular expressions in ITS 2.0

On Nov 20, 2012, at 0:46 , Jirka Kosek wrote:

> On 20.11.2012 7:33, Felix Sasaki wrote:
> 
>> I have no opinion on that. Others in the MLW-LT group: what do you
>> think? Note that if we want to change the regex definition we should do
>> this within the next two weeks, since in "last call" stage such a change
>> would force us to go back to normal working draft.
> 
> Actually all features mentioned by Norbert can be "simplified" to
> regular expression which is not using those constructs. However as those
> characters classes are very handy so I think that we want to keep them in.
> 
> For example imagine you would like to simplify \p{IsGreek}. If you have
> access to Unicode database you can simply turn this into [αβγ...]. But I
> think that such simplification should be done by application not by end
> user and thus we should keep RE syntax as it is.
> 
>     Jirka

Actually, my question came more from the ECMAScript point of view: Which of these features would the regular expressions in ECMAScript have to support in order to make a "simplification" layer unnecessary for most applications? E.g., do you anticipate that character blocks will be commonly used, or only in rare situations? If developers using ITS were given a choice between character blocks and scripts [1, 2], which ones would they choose? Do ITS developers really need the XML-specific escapes \i, \I, \c, \C?

[1] http://unicode.org/reports/tr18/#Blocks
[2] http://unicode.org/reports/tr18/#Script_Property

Norbert

Received on Tuesday, 20 November 2012 15:13:11 UTC