[ACTION 496] Allowed Characters regex...

Hi Pablo, all,

As I was working on implementing the changes for Allowed Characters in the specification I noticed that MultiCharEsc (i.e. '\' [dD]) is not compatible with some regex engines. For example in Java \d covers only ASCII digits, while in ICU it covers the same as \p{Nd}.

So I think we should remove it.

If we do that, then the ABNF would be probably:

[1] charClass ::= SingleCharEsc | charClassExpr | WildcardEsc
[2] SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]
[3] charClassExpr ::= '[' charGroup ']'
[4] charGroup ::= posCharGroup | negCharGroup
[5] posCharGroup ::= ( charRange | SingleCharEsc )+
[6] charRange ::= seRange | XmlCharIncDash
[7] seRange ::= charOrEsc '-' charOrEsc
[8] charOrEsc ::= XmlChar | SingleCharEsc
[9] XmlChar ::= [^\#x2D#x5B#x5D]
[10] XmlCharIncDash ::= [^\#x5B#x5D]
[11] negCharGroup ::= '^' posCharGroup
[12] WildcardEsc ::= '.'

What do you all think?
-yves

Received on Thursday, 25 April 2013 19:20:42 UTC