Re: forbiddenCharacters data category - related to [ACTIOn-189]

Now with the right sender address ...

Am Dienstag, 28. August 2012 schrieb Felix Sasaki :

> Hi Yves, all,
> let's try to foresee the future for the moment, assuming we go the
> "sub-set of XML Schema regex" approach.
> 1) Users of forbidden characters will read in the spec "you an use regular
> expressions. Btw., you should restrict yourself to the following subset:
> [...]
> 2) Users will be happy saying "great, I can use my own engine and don't
> have to care about regex details, that is [...] above
> 3) At the end there will be a lot of regex that exceed the subset - and no
> producer or consumer will have a means to check that.
> Why is 3)? In your initial proposal you had e.g. "<>", and I found out
> that this is not conform to XML Schema just by checking the regex in an XML
> Schema editor.
> If we go for the "you can use any engine ..." approach, my prediction is
> that there will be even less interop of people using forbidden characters
> than with SRX. People will use forbiddencharacters as "forbidden strings"
> and do what they want.
> With XML Schema regex, we easily can build test cases containing e.g.
> "<>". The XML Schema regex engine will tell people that they are doing
> something wrong, as Jirka pointed out. With our hand written "subset
> regex", we would need to build our own reg ex conformance test suite. I
> don't want to go that path.
> I understand your argument about adding dependencies. For JavaScript, I'm
> aware of saxon CE, so you can use XML Schema regex here. For ruby, python:
> I don't know. But the XML Schema we are talking about is around since 2001
> ... so I think it is reasonable to say: if you want to use ITS 2.0, you
> need to take the effort to resolve the dependencies. The effort is not
> zero, but I think it's the only way to assure long-term interop between
> users of forbidden characters.
> Best,
> Felix
> Am Dienstag, 28. August 2012 schrieb Yves Savourel :
>> Hi Jirka, all,
>> > Of course implementations are also important. But as there are
>> > open-source implementations of XML Schema regexps for all major
>> > platforms -- for example Saxon for Java/.NET and libxml2 for C/C++
>> > -- I don't see any problem here. You will simply reuse existing
>> > code instead of relying on default platform regexp engine.
>> I think there is a vast difference in using the platform's regex and a
>> third party library: adding dependencies may be difficult or not possible
>> in real-life scenarios. Also can we be absolutely sure that all major
>> programming languages will have a free and working implementation of XML
>> schema's regex (including Ruby, Python, Client-side JavaScript, etc.)?
>> I've seen a similar story for SRX: the regex syntax is based on ICU's.
>> The idea was that applications could easily use either the C, C++ or Java
>> implementations. The result wasn't that rosy. To cut the story short, today
>> almost every application uses the platform's regex engine instead of ICU's
>> and is neither supporting SRX properly nor provide true interoperability.
>> I hope we can avoid such outcome for ITS. We have the chance that a
>> sub-set of XML Schema's regex would be enough to do the work and be
>> interoperable with all (as far as I know) other engines... I'd say it's an
>> attractive solution. We do think mostly about the users here: trying to
>> prevent them to end up with interoperability issues.
>> Cheers,
>> -yves

Felix Sasaki
DFKI / W3C Fellow

Received on Tuesday, 28 August 2012 06:54:10 UTC