W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > August 2012

Re: forbiddenCharacters data category - related to [ACTIOn-189]

From: Jirka Kosek <jirka@kosek.cz>
Date: Tue, 28 Aug 2012 11:14:51 +0200
Message-ID: <503C8C0B.2050007@kosek.cz>
To: Yves Savourel <ysavourel@enlaso.com>
CC: public-multilingualweb-lt@w3.org
On 28.8.2012 4:51, Yves Savourel wrote:

> I think there is a vast difference in using the platform's regex and a third party library: adding dependencies may be difficult or not possible in real-life scenarios. Also can we be absolutely sure that all major programming languages will have a free and working implementation of XML schema's regex (including Ruby, Python, Client-side JavaScript, etc.)?
> I've seen a similar story for SRX: the regex syntax is based on ICU's. The idea was that applications could easily use either the C, C++ or Java implementations. The result wasn't that rosy. To cut the story short, today almost every application uses the platform's regex engine instead of ICU's and is neither supporting SRX properly nor provide true interoperability.

Actually every platform which has built in support for XML Schema 1.0
can implement this checking without any dependencies. Your application
can build ad hoc XSD schema with datatypes restricted by regular
expressions taken from its:allowedCharacters and then validate your XML
document (or subset containing just strings to check) against such schema.

Personally I would go with XML Schema regexp in our draft. If there will
be pushback from more implementers we can adjust draft in the future
before producing final spec.


  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member

Received on Tuesday, 28 August 2012 09:15:23 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:51 UTC