W3C home > Mailing lists > Public > public-exi-comments@w3.org > September 2008

"charClassSub" in restricted character set derivation

From: <pub@upokecenter.com>
Date: Sat, 20 Sep 2008 13:58:22 -0400
Message-Id: <33226400.465401221933502508.JavaMail.servlet@perfora>
To: <public-exi-comments@w3.org>

I want to make a suggestion on the section 'Deriving Character Sets from XML Schema Regular Expressions':

I want to propose that datatypes with a regular expression containing a "charClassSub" should have no restricted character set. The reason is that all the remaining parts of the regular expression derivation expect only a union of characters, which is very efficient in determining whether the expression contains a restricted character set or not. Having a 'charClassSub' as part of the derivation process may complicate this, as the program now has to subtract portions of the character set as well as add to them, which may be a problem if the character set contains a large number of characters, like this:

[&#x20;-&#xFF00;-[&#x60;-&#xFF00]]

That regular expression above would yield a restricted character set of 64 characters; however the implementation may require storing thousands of characters (a naive implementation, yes) before it must exclude them in the 'charClassSub' portion of the regular expression. Another problem is nested 'charClassSub' sets. For example, the following regular expression is allowed:

[A-Z-[B-Z-[C-Z-[D-Z-[E-Z-[...]]]]]]

Both problems make 'charClassSub' problematic in restricted character set derivation. I thank you for your time.
Received on Sunday, 21 September 2008 10:25:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 21 September 2008 10:25:45 GMT