W3C home > Mailing lists > Public > xmlschema-dev@w3.org > April 2011

RE: Algorithm for merging the pattern facets in a base simpleType with a subtype?

From: Costello, Roger L. <costello@mitre.org>
Date: Tue, 19 Apr 2011 16:34:07 -0400
To: "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-ID: <9E51F88D5247B648908850C35A3BBB500538DD79EC@IMCMBX3.MITRE.ORG>
Thanks Michael. Very enlightening. 

I'd like to confirm my new understanding.

Here simpleType "A" is the base of simpleType "B":

    <xs:simpleType name="A">
        <xs:restriction base="xs:string">
            <xs:pattern value="[a-z]{10}" />
        </xs:restriction>
    </xs:simpleType>
    
    <xs:simpleType name="B">
        <xs:restriction base="A">
            <xs:pattern value="[a-z]{20}" />
        </xs:restriction>
    </xs:simpleType>

Here I declare an element, Test, to be of type "B":

  <xs:element name="Test" type="B" />

The value of "B" must consist of the letters a-z and the length must be exactly 10 characters AND exactly 20 characters. Clearly that is impossible, so B has no valid value.

Is that correct thus far?

Compare the above two simpleTypes against this simpleType:

    <xs:simpleType name="C">
        <xs:restriction base="xs:string">
            <xs:pattern value="[a-z]{10}" />
            <xs:pattern value="[a-z]{20}" />
        </xs:restriction>
    </xs:simpleType>

I declare an element, Test, to be of type "C":

  <xs:element name="Test" type="C" />

The value of "C" must consist of the letters a-z and the length must be exactly 10 characters OR exactly 20 characters. So either of these is valid:

    <Test> abcdefghijabcdefghij</Test>

    <Test> abcdefghij</Test>

Is that correct?

Let's return to the "A" and "B" example. I would like to merge them into a single simpleType. I have learned that simply merging the pattern facets of "A" into "B" to yield "C" is not correct.  It would be so nice if I could simply write:

    [a-z]{10} and [a-z]{20}

Unfortunately, there is no "and" operator in regex. So, any ideas on how to "and" arbitrary regex expressions?

/Roger
     



-----Original Message-----
From: C. M. Sperberg-McQueen [mailto:cmsmcq@blackmesatech.com] 
Sent: Tuesday, April 19, 2011 12:39 PM
To: Costello, Roger L.
Cc: C. M. Sperberg-McQueen; xmlschema-dev@w3.org
Subject: Re: Algorithm for merging the pattern facets in a base simpleType with a subtype?


On Apr 19, 2011, at 7:15 AM, Costello, Roger L. wrote:

> Hi Folks,
> 
> Suppose that simpleType "A" is the base or simpleType "B":
> ...
> 
> Suppose that "A" contains one or more pattern facets:
> ...
> 
> What patterns apply to "B"?

In general, and informally, for any facet at all, the facet-based
constraints on B are the union of those specified on the declaration
of B and those B inherits from A.  

In the XSD spec, the explanation is not quite so simple, because the
spec attempts to ensure that the {facets} value 'makes sense'.  So
the spec is full of extra ad hoc rules which make the story more
complicated; some of these unneeded complications affect the pattern
facet.

In XSD 1.0, section 4.3.4.3 of the Datatypes spec has the following
note:

    Note: It is a consequence of the schema representation
    constraint Multiple patterns (§4.3.4.3) and of the rules for
    ·restriction· that ·pattern· facets specified on the same step
    in a type derivation are ORed together, while ·pattern· facets
    specified on different steps of a type derivation are ANDed
    together.

    Thus, to impose two ·pattern· constraints simultaneously, schema
    authors may either write a single ·pattern· which expresses the
    intersection of the two ·pattern·s they wish to impose, or
    define each ·pattern· on a separate type derivation step.

The rules for restriction referred to in the note are laid out
explicitly in the Structures spec in Schema Component Constraint:
Simple Type Restriction (Facets) in section 3.14.6, which specifies
that when the facets specified on B are merged into the set of
facets inherited from A, multiple patterns are allowed.  

So the patterns inherited from A and those specified on B must all
be satisfied.

In XSD 1.1, the pattern facet is redefined to have as its value a
set of regular expressions, instead of a single regular expression,
and the XML mapping specified in Datatypes 4.3.4.2 for the pattern
facet's {value} property is modified to take any new patterns
specified on restrictions and add them to the set inherited from the
base type.  The facet overlay process defined in Structures 3.16.6.4
is correspondingly simpler.


> 
> I believe there are only two cases to consider:
> 
> CASE 1: "B" does not have any pattern facets.
> 
> Therefore, the patterns that apply to "B" are the pattern(s) contained in "A".

Yes, assuming that by "pattern(s) contained in 'A'" you mean 
"the pattern facet(s) in simple type definition A".  If you mean
just those lexically present in the source declaration of A, you 
have inadvertently left out any patterns inherited by A from its
base type.

> 
> CASE 2: "B" has one or more pattern facets.
> 
> The patterns in "B" must be a restriction of the patterns in "A".

No, there is no constraint in XSD 1.0 or 1.1 that requires a 
pattern specified in a restriction to have any relation at all to
the lexical space of the base type.  (This contrasts with the
rules for content models, which do impose such requirements.)

The rules that new patterns and inherited patterns are ANDed
together and that the new pattern does not need to recapitulate
constraints already expressed are exploited in the definitions of 
yearMonthDuration and dayTimeDuration in XSD 1.1, as explained 
in sections 3.4.26.1 and 3.4.27.1 of XSD 1.1 Datatypes.

> Therefore, the patterns that apply to "B" are just the patterns contained in "B". Effectively the patterns in "A" may be ignored. Do you agree?


No, sorry, there is nothing in the spec to justify that conclusion.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Tuesday, 19 April 2011 20:34:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 April 2011 20:34:36 GMT