- From: Norbert Lindenberg <w3@norbertlindenberg.com>
- Date: Tue, 20 Nov 2012 07:12:40 -0800
- To: Jirka Kosek <jirka@kosek.cz>
- Cc: Norbert Lindenberg <w3@norbertlindenberg.com>, Felix Sasaki <fsasaki@w3.org>, public-multilingualweb-lt@w3.org, www-international <www-international@w3.org>
On Nov 20, 2012, at 0:46 , Jirka Kosek wrote: > On 20.11.2012 7:33, Felix Sasaki wrote: > >> I have no opinion on that. Others in the MLW-LT group: what do you >> think? Note that if we want to change the regex definition we should do >> this within the next two weeks, since in "last call" stage such a change >> would force us to go back to normal working draft. > > Actually all features mentioned by Norbert can be "simplified" to > regular expression which is not using those constructs. However as those > characters classes are very handy so I think that we want to keep them in. > > For example imagine you would like to simplify \p{IsGreek}. If you have > access to Unicode database you can simply turn this into [αβγ...]. But I > think that such simplification should be done by application not by end > user and thus we should keep RE syntax as it is. > > Jirka Actually, my question came more from the ECMAScript point of view: Which of these features would the regular expressions in ECMAScript have to support in order to make a "simplification" layer unnecessary for most applications? E.g., do you anticipate that character blocks will be commonly used, or only in rare situations? If developers using ITS were given a choice between character blocks and scripts [1, 2], which ones would they choose? Do ITS developers really need the XML-specific escapes \i, \I, \c, \C? [1] http://unicode.org/reports/tr18/#Blocks [2] http://unicode.org/reports/tr18/#Script_Property Norbert
Received on Tuesday, 20 November 2012 15:13:11 UTC