W3C home > Mailing lists > Public > xmlschema-dev@w3.org > April 2011

Re: Algorithm for merging the pattern facets in a base simpleType with a subtype?

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 19 Apr 2011 17:26:52 -0600
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-Id: <545049D8-3978-4904-A9FF-F6457C578A00@blackmesatech.com>
To: "Costello, Roger L." <costello@mitre.org>

On Apr 19, 2011, at 2:34 PM, Costello, Roger L. wrote:

> Thanks Michael. Very enlightening. 
> 
> I'd like to confirm my new understanding.
> 
> Here simpleType "A" is the base of simpleType "B":
> 
>    <xs:simpleType name="A">
>        <xs:restriction base="xs:string">
>            <xs:pattern value="[a-z]{10}" />
>        </xs:restriction>
>    </xs:simpleType>
> 
>    <xs:simpleType name="B">
>        <xs:restriction base="A">
>            <xs:pattern value="[a-z]{20}" />
>        </xs:restriction>
>    </xs:simpleType>
> 
> Here I declare an element, Test, to be of type "B":
> 
>  <xs:element name="Test" type="B" />
> 
> The value of "B" must consist of the letters a-z and the length must be exactly 10 characters AND exactly 20 characters. Clearly that is impossible, so B has no valid value.
> 
> Is that correct thus far?

I have not checked to see what any validators say, so I may 
be missing a step somewhere, but I believe this is correct 
so far.

> 
> Compare the above two simpleTypes against this simpleType:
> 
>    <xs:simpleType name="C">
>        <xs:restriction base="xs:string">
>            <xs:pattern value="[a-z]{10}" />
>            <xs:pattern value="[a-z]{20}" />
>        </xs:restriction>
>    </xs:simpleType>
> 
> I declare an element, Test, to be of type "C":
> 
>  <xs:element name="Test" type="C" />
> 
> The value of "C" must consist of the letters a-z and the length must be exactly 10 characters OR exactly 20 characters. So either of these is valid:
> 
>    <Test> abcdefghijabcdefghij</Test>
> 
>    <Test> abcdefghij</Test>
> 
> Is that correct?

That's what I believe the spec says, yes.

> Let's return to the "A" and "B" example. I would like to merge them into a single simpleType.

What kind of merger do you have in mind?  If you want
a value space containing all the values of A and also all
the values of B, use a union and specify A and B as its
member types.

If you want a value space containing all the values that are
values of A and also values of B, then you want an empty
value space, a type with no instances.

> I have learned that simply merging the pattern facets of "A" into "B" to yield "C" is not correct.  It would be so nice if I could simply write:
> 
>    [a-z]{10} and [a-z]{20}

If you want to define a type with no instances, the simplest way is
probably to declare a union with no member types.  That works in
XSD 1.1, though it's disallowed in 1.0 (because the WG felt it made
no sense).

If you aren't trying to define a type with no instances, then I
don't think I understand what you want the expression just
given to mean.

> 
> Unfortunately, there is no "and" operator in regex. So, any ideas on how to "and" arbitrary regex expressions?

The simplest way to get the AND of two patterns E1 and E2 is to
declare a simple type with pattern E1 and then define a second
type restricting the first by adding E2.

The 1.1 declaration of yearMonthDuration provides a real-world
example.  The restriction adds the pattern [^DT]* to say that each
instance of yearMonthDuration must match the patterns of duration
and must also be a string of characters none of which is a D or a T
(which rules out any occurrence of the day, month, hour, and minute 
fields).

An alternative method would be to write the regex you want
using AND and any other logical operators you need, then
use Brzozowski derivatives to calculate a finite-state machine
for the language, then use standard methods to calculate a
regular expression for the language recognized by the finite
state machine.  This is a lot of tedious work and the usual
methods of generating regexes from FSMs are apt to produce
verbose results, but it does get you a single regular expression, 
if that's what you really really want.

I hope this helps.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Tuesday, 19 April 2011 23:27:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 April 2011 23:27:19 GMT