RE: [ACTION-385] Common regular expression syntax

Hi Dave,

Yes, the sub-set described by Shaun in would be fine.
A note on best practices for Unicode normalization would be fine too.


-----Original Message-----
From: Dave Lewis [] 
Sent: Sunday, February 03, 2013 2:01 PM
Subject: Re: [ACTION-385] Common regular expression syntax

Hi Yves,
In relation to ISSUE-67, as the original commenter, can you confirm you are satisfied with the resolution that we use the limited regex subset Shaun identified in response to ACTION-385 in the Allowed Characters data category, together with an accompanying note on best practice for unicode normalisation that Sahun is addressing under ACTION-430 (Draft text explaining importance of Unicode normalization and best practices on ISSUE-67)?


On 27/01/2013 22:31, Yves Savourel wrote:
> Hi Shaun,
> Thanks for the thorough analysis.
> That should be enough for the goals of the data category.
> cheer,
> -yves
> -----Original Message-----
> From: Shaun McCance []
> Sent: Sunday, January 27, 2013 10:30 AM
> To:
> Subject: [ACTION-385] Common regular expression syntax
> I've investigated features in six different regular expression dialects to try to find a safe common subset for the allowed characters data category. I tested Java, .Net, XSD, JavaScript, Perl, and Python. I still want to test POSIX EREs, and PHP may be good to test as well, given the focus on CMSs in 2.0. But I think the subset from the six I tested is going to be safe in general.
> So what I think this leaves us with is character classes [abc], ranges [a-c], and negations [^abc], there "^" and "]" must never appear unless backslash-escaped, "-" may be backslash-escaped or put at the beginning or end, the escape sequences "\n", "\r", "\t", "\d", and "\D" may be used, and literal "\" is escaped as "\\".
> Importantly, you must never have an unescaped backslash, because some dialects may treat it as the beginning of an escape sequence that means something special.
> This is a very limited subset, but I think it's what we have to use. I'm now going to try to make a portable RE that matches these portable RE character classes.
> Comments?
> --
> Shaun

Received on Sunday, 3 February 2013 21:33:08 UTC