- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 28 Aug 2012 08:53:42 +0200
- To: Yves Savourel <ysavourel@enlaso.com>
- Cc: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
- Message-ID: <CAL58czqCOL9=U+KviYQQXkAP7P2NAd1GNi0C1mRFNTpd_-T+dA@mail.gmail.com>
Now with the right sender address ... Am Dienstag, 28. August 2012 schrieb Felix Sasaki : > Hi Yves, all, > > let's try to foresee the future for the moment, assuming we go the > "sub-set of XML Schema regex" approach. > > 1) Users of forbidden characters will read in the spec "you an use regular > expressions. Btw., you should restrict yourself to the following subset: > [...] > 2) Users will be happy saying "great, I can use my own engine and don't > have to care about regex details, that is [...] above > 3) At the end there will be a lot of regex that exceed the subset - and no > producer or consumer will have a means to check that. > > Why is 3)? In your initial proposal you had e.g. "<>", and I found out > that this is not conform to XML Schema just by checking the regex in an XML > Schema editor. > > If we go for the "you can use any engine ..." approach, my prediction is > that there will be even less interop of people using forbidden characters > than with SRX. People will use forbiddencharacters as "forbidden strings" > and do what they want. > > With XML Schema regex, we easily can build test cases containing e.g. > "<>". The XML Schema regex engine will tell people that they are doing > something wrong, as Jirka pointed out. With our hand written "subset > regex", we would need to build our own reg ex conformance test suite. I > don't want to go that path. > > I understand your argument about adding dependencies. For JavaScript, I'm > aware of saxon CE, so you can use XML Schema regex here. For ruby, python: > I don't know. But the XML Schema we are talking about is around since 2001 > ... so I think it is reasonable to say: if you want to use ITS 2.0, you > need to take the effort to resolve the dependencies. The effort is not > zero, but I think it's the only way to assure long-term interop between > users of forbidden characters. > > Best, > > Felix > > > Am Dienstag, 28. August 2012 schrieb Yves Savourel : > >> Hi Jirka, all, >> >> > Of course implementations are also important. But as there are >> > open-source implementations of XML Schema regexps for all major >> > platforms -- for example Saxon for Java/.NET and libxml2 for C/C++ >> > -- I don't see any problem here. You will simply reuse existing >> > code instead of relying on default platform regexp engine. >> >> I think there is a vast difference in using the platform's regex and a >> third party library: adding dependencies may be difficult or not possible >> in real-life scenarios. Also can we be absolutely sure that all major >> programming languages will have a free and working implementation of XML >> schema's regex (including Ruby, Python, Client-side JavaScript, etc.)? >> >> I've seen a similar story for SRX: the regex syntax is based on ICU's. >> The idea was that applications could easily use either the C, C++ or Java >> implementations. The result wasn't that rosy. To cut the story short, today >> almost every application uses the platform's regex engine instead of ICU's >> and is neither supporting SRX properly nor provide true interoperability. >> >> I hope we can avoid such outcome for ITS. We have the chance that a >> sub-set of XML Schema's regex would be enough to do the work and be >> interoperable with all (as far as I know) other engines... I'd say it's an >> attractive solution. We do think mostly about the users here: trying to >> prevent them to end up with interoperability issues. >> >> Cheers, >> -yves >> >> >> >> > > > -- Felix Sasaki DFKI / W3C Fellow
Received on Tuesday, 28 August 2012 06:54:10 UTC