Re: Follow up on regular expressions in ITS 2.0 from Norbert Lindenberg on 2012-11-20 (www-international@w3.org from October to December 2012)

From: Norbert Lindenberg <w3@norbertlindenberg.com>
Date: Tue, 20 Nov 2012 07:12:40 -0800
To: Jirka Kosek <jirka@kosek.cz>
Cc: Norbert Lindenberg <w3@norbertlindenberg.com>, Felix Sasaki <fsasaki@w3.org>, public-multilingualweb-lt@w3.org, www-international <www-international@w3.org>
Message-Id: <39EAE13D-AC54-45B5-BB90-AAF5094AED37@norbertlindenberg.com>

On Nov 20, 2012, at 0:46 , Jirka Kosek wrote:

> On 20.11.2012 7:33, Felix Sasaki wrote:
> 
>> I have no opinion on that. Others in the MLW-LT group: what do you
>> think? Note that if we want to change the regex definition we should do
>> this within the next two weeks, since in "last call" stage such a change
>> would force us to go back to normal working draft.
> 
> Actually all features mentioned by Norbert can be "simplified" to
> regular expression which is not using those constructs. However as those
> characters classes are very handy so I think that we want to keep them in.
> 
> For example imagine you would like to simplify \p{IsGreek}. If you have
> access to Unicode database you can simply turn this into [αβγ...]. But I
> think that such simplification should be done by application not by end
> user and thus we should keep RE syntax as it is.
> 
>     Jirka

Actually, my question came more from the ECMAScript point of view: Which of these features would the regular expressions in ECMAScript have to support in order to make a "simplification" layer unnecessary for most applications? E.g., do you anticipate that character blocks will be commonly used, or only in rare situations? If developers using ITS were given a choice between character blocks and scripts [1, 2], which ones would they choose? Do ITS developers really need the XML-specific escapes \i, \I, \c, \C?

[1] http://unicode.org/reports/tr18/#Blocks
[2] http://unicode.org/reports/tr18/#Script_Property

Norbert

Received on Tuesday, 20 November 2012 15:13:11 UTC