W3C home > Mailing lists > Public > xmlschema-dev@w3.org > April 2011

RE: Algorithm for merging the pattern facets in a base simpleType with a subtype?

From: Liam R E Quin <liam@w3.org>
Date: Tue, 19 Apr 2011 20:35:25 -0400
To: "Costello, Roger L." <costello@mitre.org>
Cc: "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-ID: <1303259725.15605.54.camel@desktop.barefootcomputing.com>
On Tue, 2011-04-19 at 16:34 -0400, Costello, Roger L. wrote:

>     [a-z]{10} and [a-z]{20}

This is a nonsense in logic (the empty set) since no values have both
exactly 10 and exactly 20 characters.

I am guessing you really mean "or" - in which case,
    ([a-z]{10})|([a-z]{20})
would be one way, and
    [a-z]{10}([a-z]{10})?
another, harder to generate automatically. There are many more possible
expressions, but the first given above is easiest, and for a Deeply
Mystical Reason, the Schema WG did not forbid non-deterministic regular
expressions in facets, despite the original claim that the UPA
restriction was there in SGML because non-determinism was hard to
implement...

> Unfortunately, there is no "and" operator in regex. So, any ideas on how to "and" arbitrary regex expressions?

There's no easy way; one hard way might be to deconstruct the two
regular expressions into non-deterministic finite-state automata and
then attempt to generate regular expression notation from the merged
automata.

The most useful part would be that if you detect an empty intersection,
you can short-circuit the process and say that since no value can
satisfy the conjunction, there are no valid instances of the type.

I don't know if the XSD regular expression language is closed under a
(putative) "and" operation.  There may not always be a single expression
to represent the result.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Received on Wednesday, 20 April 2011 00:35:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 20 April 2011 00:35:29 GMT