- From: Liam R E Quin <liam@w3.org>
- Date: Tue, 19 Apr 2011 20:35:25 -0400
- To: "Costello, Roger L." <costello@mitre.org>
- Cc: "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
On Tue, 2011-04-19 at 16:34 -0400, Costello, Roger L. wrote:
> [a-z]{10} and [a-z]{20}
This is a nonsense in logic (the empty set) since no values have both
exactly 10 and exactly 20 characters.
I am guessing you really mean "or" - in which case,
([a-z]{10})|([a-z]{20})
would be one way, and
[a-z]{10}([a-z]{10})?
another, harder to generate automatically. There are many more possible
expressions, but the first given above is easiest, and for a Deeply
Mystical Reason, the Schema WG did not forbid non-deterministic regular
expressions in facets, despite the original claim that the UPA
restriction was there in SGML because non-determinism was hard to
implement...
> Unfortunately, there is no "and" operator in regex. So, any ideas on how to "and" arbitrary regex expressions?
There's no easy way; one hard way might be to deconstruct the two
regular expressions into non-deterministic finite-state automata and
then attempt to generate regular expression notation from the merged
automata.
The most useful part would be that if you detect an empty intersection,
you can short-circuit the process and say that since no value can
satisfy the conjunction, there are no valid instances of the type.
I don't know if the XSD regular expression language is closed under a
(putative) "and" operation. There may not always be a single expression
to represent the result.
Liam
--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Received on Wednesday, 20 April 2011 00:35:28 UTC