RE: [Issue-67] [Action-385] Work on regex for validating regex subset proposal from Yves Savourel on 2013-04-08 (public-multilingualweb-lt@w3.org from April 2013)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Mon, 8 Apr 2013 11:33:31 -0600
To: "'Pablo Nieto Caride'" <pablo.nieto@linguaserve.com>, "'Felix Sasaki'" <fsasaki@w3.org>, "'Jirka Kosek'" <jirka@kosek.cz>
CC: <public-multilingualweb-lt@w3.org>
Message-ID: <004a01ce347f$30e5caa0$92b15fe0$@com>

Hi Felix, Pablo, Jirka,

 

The ABNF description is probably something we really have to have in the specification: it’s human readable and formal.

 

Having a corresponding regex in the schema to check the values would be a big plus. But I don’t think not having it working yet should stop us to update the specification.

 

-yves

 

 

 

From: Pablo Nieto Caride [mailto:pablo.nieto@linguaserve.com] 
Sent: Monday, April 08, 2013 11:21 AM
To: 'Felix Sasaki'; 'Jirka Kosek'
Cc: public-multilingualweb-lt@w3.org
Subject: RE: [Issue-67] [Action-385] Work on regex for validating regex subset proposal

 

Hi Felix, Jirka, all,

 

As I said I think that the ABNF approach it’s not bad, but I also think that having a list of allowed items and the regex in the schema is fine too, I don’t know what the implementers of the data category think about this.

 

Thanks Jirka the new library works.

 

Cheers,

Pablo.

-------------------------------------------------------

Am 08.04.13 18:28, schrieb Jirka Kosek:

On 8.4.2013 18:15, Felix Sasaki wrote:
 

Trying to move this forward:
Would this ABNF make sense to you
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Apr/0027.html
 
("BMP+escapes" still needs to be defined)

 
I'm not sure whether this ABNF does what it should do. For example this
grammar allows ^ almost anywhere but I think that in most RE engines ^
should directly follow [ if it's meant as a negation.


Agree - you could resolve that by removing neg from 
char = [neg] BMP+escapes
and change 
allowedCharacters = start 1*range end ["+"]
to
allowedCharacters = start [neg] 1*range end ["+"]



 
 
Maybe starting with grammar in W3C XML Schema spec and forbidding some
rules would be easier.


Currently in the spec
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#allowedchars-definition
We reference the XML Schema grammar
http://www.w3.org/TR/xmlschema-2/#charcter-classes
but not a specific production in the grammar. Which one would you choose, e.g.
http://www.w3.org/TR/xmlschema-2/#nt-charClassExpr
?

I'm fine with the "XML Schema disallowing" approach. But ending up with a means to validate the regex, and not leaving that to the regex engine, seems crucial as part of resolving the issue. From previous discussions it seems pointing people to XML Schema with some additional information (e.g. "assume that this is not allowed" won't help - implementers will just use their (non XML Schema) engine.



 
 

P.S.: different topic - I had the same issues as Pablo with the
validation with the testsuite: I had to use my local copy of jing, the
one in github didn't work.

 
It works for me. Anyway I synced versions of Jing, so you can give it
another try.


Thanks, will do.

Best,

Felix

Received on Monday, 8 April 2013 17:34:09 UTC