W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > October to December 2000

Comment on XML Schema Part 2: Datatypes Appendix E Regular Expres sion

From: Plonus, Fa. Integrata, ITS P, M <U.Plonus@deutschepost.de>
Date: Wed, 18 Oct 2000 08:29:28 +0200
Message-Id: <019E7FD6783AD1119BA0400000363304137188@MNPAH004.DeutschePost.de>
To: "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
I've a comment on the regular expressions in the named document. I've mailed
this comment to Mr. Biron and he has told me to mail it to this

I'm working on an implementation for regular expressions. Because we want to
use the XML schema after becoming a recommandation, I took the specification
of the XML schema part 2 for implementation.
In section E.1 Character Classes I've found a part of text, which can lead
to misinterpretation. It's the part of character ranges, which can be
interpreted wrong.
It says, that the characters '[' and ']' are no valid character ranges. But
in the form 's-e' it is allowed to replace 's' with '[' or ']' and to
replace 'e' with ']'. There're some constructs, which can not interpreted,
if this characters are allowed.

First example:
This example has the following possible interpretations:
1. first three minus-signs are a character range, the fourth minus-sign is a
subtraction and the part '[-]' is a character class expression.
2. first minus-sign is a start of positive character group, the second
through fourth minus-sign are a character range and the part '[-]' is also a
character range.
This two interpretations are not equivalent, but both can be interpreted
with the definition.

Second example:
The following interpretations are possible:
1. The part '[A-]]' means a character class expression and the rest is no
character class expression.
2. After the opening '[' I take the part 'A-]' as a character range, the
following ']-z]' as a concatenated character range and then I've problems
getting an end of this character class expression.

A suggetion to avoid this misinterpretations is not to allow the characters
'[' and ']' as start and end of a character range. The only exception could
be as the first character of a positive character group.


Uwe Plonus (Fa. Integrata Unternehmensberatung)
*	Deutsche Post AG
	Höltystr. 8
	81369 München
*	089/74123-596
*	U.Plonus@DeutschePost.de
Received on Wednesday, 18 October 2000 02:30:04 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:08:49 UTC