W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > October to December 2000

Comment on XML Schema Part 2: Datatypes Appendix E Regular Expres sion

From: Plonus, Fa. Integrata, ITS P, M <U.Plonus@deutschepost.de>
Date: Wed, 18 Oct 2000 08:29:28 +0200
Message-Id: <019E7FD6783AD1119BA0400000363304137188@MNPAH004.DeutschePost.de>
To: "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
I've a comment on the regular expressions in the named document. I've mailed
this comment to Mr. Biron and he has told me to mail it to this
comment-address.

I'm working on an implementation for regular expressions. Because we want to
use the XML schema after becoming a recommandation, I took the specification
of the XML schema part 2 for implementation.
In section E.1 Character Classes I've found a part of text, which can lead
to misinterpretation. It's the part of character ranges, which can be
interpreted wrong.
It says, that the characters '[' and ']' are no valid character ranges. But
in the form 's-e' it is allowed to replace 's' with '[' or ']' and to
replace 'e' with ']'. There're some constructs, which can not interpreted,
if this characters are allowed.

First example:
[----[-]]
This example has the following possible interpretations:
1. first three minus-signs are a character range, the fourth minus-sign is a
subtraction and the part '[-]' is a character class expression.
2. first minus-sign is a start of positive character group, the second
through fourth minus-sign are a character range and the part '[-]' is also a
character range.
This two interpretations are not equivalent, but both can be interpreted
with the definition.

Second example:
[A-]]-z(ab|cd)
The following interpretations are possible:
1. The part '[A-]]' means a character class expression and the rest is no
character class expression.
2. After the opening '[' I take the part 'A-]' as a character range, the
following ']-z]' as a concatenated character range and then I've problems
getting an end of this character class expression.

A suggetion to avoid this misinterpretations is not to allow the characters
'[' and ']' as start and end of a character range. The only exception could
be as the first character of a positive character group.

Bye,

Uwe Plonus (Fa. Integrata Unternehmensberatung)
*	Deutsche Post AG
	Höltystr. 8
	81369 München
*	089/74123-596
*	U.Plonus@DeutschePost.de
Received on Wednesday, 18 October 2000 02:30:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 6 December 2009 18:12:48 GMT