Re: [Issue-67] [Action-385] Work on regex for validating regex subset proposal

Am 08.04.13 19:33, schrieb Jirka Kosek:
> On 8.4.2013 18:48, Felix Sasaki wrote:
>
>>> I'm not sure whether this ABNF does what it should do. For example this
>>> grammar allows ^ almost anywhere but I think that in most RE engines ^
>>> should directly follow [ if it's meant as a negation.
>> Agree - you could resolve that by removing neg from
>> char = [neg] BMP+escapes
>> and change
>> allowedCharacters = start 1*range end ["+"]
>> to
>> allowedCharacters = start [neg] 1*range end ["+"]
> Yes, this resolves this one particular issue. I haven't been tracking RE
> discussion that closely, but I'm not sure whether it's clear what should
> be supported. My recollection that what people asking for change wanted
> was something like:
>
> [11]    charClass (http://www.w3.org/TR/xmlschema-2/#nt-charClass)
>
> just without allowing following productions:  MultiCharEsc, catEsc  and
> complEsc.


Looks similar - though in Shaun's subset at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0180.html
"So what I think this leaves us with is character classes
[abc], ranges [a-c], and negations [^abc], there "^" and
"]" must never appear unless backslash-escaped, "-" may
be backslash-escaped or put at the beginning or end, the
escape sequences "\n", "\r", "\t", "\d", and "\D" may be
used, and literal "\" is escaped as "\\"."

We had a part of
http://www.w3.org/TR/xmlschema-2/#nt-MultiCharEsc
that is "\d" and " \D".


>
> This is very different from what your simple ABNF allows.

Agree.

>   Might be
> implementers know what subset of REs should be supported

Putting Karl as one implementer into the loop: what do you think abou 
the subset?

> but this has to
> be explicitly written down somewhere in the spec if we want to reach
> interoperability.

Agree. And also agree with not using my subset proposal but rather an 
XML Schema based subset - we just have to specify it.

Best,

Felix

Received on Monday, 8 April 2013 17:43:17 UTC