W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > August 2012

Re: forbiddenCharacters data category - related to [ACTIOn-189]

From: Jirka Kosek <jirka@kosek.cz>
Date: Mon, 27 Aug 2012 21:40:52 +0200
Message-ID: <503BCD44.7080606@kosek.cz>
To: Yves Savourel <ysavourel@enlaso.com>
CC: public-multilingualweb-lt@w3.org
On 27.8.2012 17:52, Yves Savourel wrote:

>> Why you can't use something like 
>> allowedCharacters="[&#x20;-&#x1ffff;-[&lt;>:&quot;\\/|\?*]]"
> I have only one last reserve: currently the syntax we define does not
> allow for nested character class subtraction (trying to keep thing
> interoperable) how would you specify [\u0000-\u001F<>:"\\/|\?*]
> without such construct?

If you want to avoid constructs like [A-[B]] then you can in this case
split range &#x20;-&#x1ffff; into several smaller ranges which will be
ending before and starting after disallowed characters.

But I don't think we should disallow [A-[B]] as this syntax is available
in XML Schema nad XPath 2.0/XQuery 1.0 -- there are plenty of existing
implementations around.

Moreover other languages offer similar syntax. For example in Java you
can map this to [A&&[^B]] if I'm not mistaken.

> Overall, I'd much rather go with allowedCharacter with regex than
> forbiddenCharacters without regex.



  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member

Received on Monday, 27 August 2012 19:41:26 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:51 UTC