Re: forbiddenCharacters data category - related to [ACTIOn-189] from Jirka Kosek on 2012-08-27 (public-multilingualweb-lt@w3.org from August 2012)

From: Jirka Kosek <jirka@kosek.cz>
Date: Mon, 27 Aug 2012 21:40:52 +0200
To: Yves Savourel <ysavourel@enlaso.com>
CC: public-multilingualweb-lt@w3.org
Message-ID: <503BCD44.7080606@kosek.cz>

On 27.8.2012 17:52, Yves Savourel wrote:

>> Why you can't use something like 
>> allowedCharacters="[&#x20;-&#x1ffff;-[&lt;>:&quot;\\/|\?*]]"
> 
> I have only one last reserve: currently the syntax we define does not
> allow for nested character class subtraction (trying to keep thing
> interoperable) how would you specify [\u0000-\u001F<>:"\\/|\?*]
> without such construct?

If you want to avoid constructs like [A-[B]] then you can in this case
split range &#x20;-&#x1ffff; into several smaller ranges which will be
ending before and starting after disallowed characters.

But I don't think we should disallow [A-[B]] as this syntax is available
in XML Schema nad XPath 2.0/XQuery 1.0 -- there are plenty of existing
implementations around.

Moreover other languages offer similar syntax. For example in Java you
can map this to [A&&[^B]] if I'm not mistaken.

> Overall, I'd much rather go with allowedCharacter with regex than
> forbiddenCharacters without regex.

+1

   Jirka

-- 
------------------------------------------------------------------
  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
------------------------------------------------------------------
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
------------------------------------------------------------------

Received on Monday, 27 August 2012 19:41:26 UTC