Re: multiple pattern facet conjunction from C. M. Sperberg-McQueen on 2006-12-30 (public-schemata-users@w3.org from December 2006)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Sat, 30 Dec 2006 10:55:22 -0700
To: Syd_Bauman@Brown.edu
Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, public-schemata-users@w3.org
Message-Id: <6DECCD18-A8D8-43AE-A6E5-1EFFFC058486@acm.org>

On 30 Dec 2006, at 04:46 , Syd Bauman wrote:

> The text of 4.3.4.3 seems problematic.
>
>    If multiple <pattern> element information items appear as
>    [children] of a <simpleType>, the [value]s should be combined as
>    if they appeared in a single regular expression as separate
>    branches.
>
> First, I am under the (perhaps erroneous) impression that a <pattern>
> element can not be the child of a <simpleType> element.

I think that's true; Schema 1.0 had a typo ('simpleType' for  
'restriction'
-- not 'children' for 'descendant', though, since simple type  
definitions
can nest).  That may be one reason that the paragraph in question
has been deleted from the current draft of XML Schema 1.1 and
the rule has been reworded.

> Second, the idea seems unhelpful. If I wanted two regular expressions
> R1 and R2 to appear in a single regular expression as separate
> branches, I could have just written "R1|R2", no?

Yes.  But not if you wished to annotate the two branches
separately, either for a human reader or for a machine.

> So my gut instinct
> is that this rule isn't useful, but I may be missing something.

It doesn't enlarge the expressive power of the language, as
regards validation, no.

> The note attached to 4.3.4.3 says
>
>    ... pattern facets specified on the same step in a type derivation
>    are ORed together, while pattern facets specified on different
>    steps of a type derivation are ANDed together.
>
> but I have yet to really figure out what a "step" is.

A step is one derivation in a derivation chain.

When one defines type T1 as a restriction of some primitive
type, and T2 as a restriction of T1, and T3 as a restriction of
T2, one has a derivation chain with three steps.  If patterns
P1 and P2 are specified as part of the definition of T1, and
P3 and P4 as part of the definition of T2 and T3 respectively,
then the lexical space of T3 contains only character
sequences which match P1|P2 and P3 and P4.

>   <xs:element name="duck">
>     <xs:simpleType>
>       <xs:restriction>
>         <xs:simpleType>
>           <xs:restriction base="xs:token">
>             <xs:pattern value="R1"/>
>             <xs:pattern value="R2"/>
>           </xs:restriction>
>         </xs:simpleType>
>       </xs:restriction>
>     </xs:simpleType>
>   </xs:element>
>
> My instinct is that this could be simplified to
>
>   <xs:element name="duck">
>     <xs:simpleType>
>       <xs:restriction base="xs:token">
>         <xs:pattern value="R1"/>
>         <xs:pattern value="R2"/>
>       </xs:restriction>
>     </xs:simpleType>
>   </xs:element>
>
> without any change to the set of documents that would be considered
> valid.

Yes.  In the second formulation, 'duck' is a restriction of token; in
the second formulation, 'duck' is a vacuous restriction of an
anonymous type which is a restriction of token.

I hope this helps.

--C. M. Sperberg-McQueen

Received on Saturday, 30 December 2006 17:55:37 UTC