W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > April 2013

RE: [ACTION 496] Allowed Characters regex...

From: Yves Savourel <ysavourel@enlaso.com>
Date: Thu, 25 Apr 2013 14:44:29 -0600
To: <public-multilingualweb-lt@w3.org>
Message-ID: <006801ce41f5$af59e460$0e0dad20$@com>
Hi Felix, all,

The last example of regex for Allowed Characters is:

"[a-&#x00ff;-[\s]]" : allows all characters between U+0061 and U+00FF except the characters SPACE (U+0020), TABULATION (U+0009), CARRIAGE RETURN (U+000D) and LINE FEED (U+000F).

it makes no sense: none of the characters to exclude is between U+0061 and U+00FF.
I'll just drop that example.

-ys

-----Original Message-----
From: Yves Savourel [mailto:ysavourel@enlaso.com] 
Sent: Thursday, April 25, 2013 1:20 PM
To: 'public-multilingualweb-lt@w3.org'
Subject: [ACTION 496] Allowed Characters regex...

Hi Pablo, all,

As I was working on implementing the changes for Allowed Characters in the specification I noticed that MultiCharEsc (i.e. '\' [dD]) is not compatible with some regex engines. For example in Java \d covers only ASCII digits, while in ICU it covers the same as \p{Nd}.

So I think we should remove it.

If we do that, then the ABNF would be probably:

[1] charClass ::= SingleCharEsc | charClassExpr | WildcardEsc [2] SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E] [3] charClassExpr ::= '[' charGroup ']'
[4] charGroup ::= posCharGroup | negCharGroup [5] posCharGroup ::= ( charRange | SingleCharEsc )+ [6] charRange ::= seRange | XmlCharIncDash [7] seRange ::= charOrEsc '-' charOrEsc [8] charOrEsc ::= XmlChar | SingleCharEsc [9] XmlChar ::= [^\#x2D#x5B#x5D] [10] XmlCharIncDash ::= [^\#x5B#x5D] [11] negCharGroup ::= '^' posCharGroup [12] WildcardEsc ::= '.'

What do you all think?
-yves
Received on Thursday, 25 April 2013 21:01:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:32:07 UTC