multiple pattern facet conjunction

Previous-subject: "Re: [oXygen-user] wrong conjunction for multiple pattern facets?"

The following is a follow-on of a discussion that has been occurring
on the oxygen-user mailing list
(http://oxygenxml.com/mailman/listinfo/oxygen-user/).


It seems pretty clear that in RelaxNG, multiple occurrences of a
<param name='pattern"> inside a single <data> element (whose type=
must be a W3C datatype that allows the pattern facet) must all be
met, i.e., they are ANDed together. The following is from section 2
of "Guidelines for using W3C XML Schema Datatypes with RELAX NG"[1]

   If the 'pattern' parameter is specified more than once for a
   single 'data' element, then a string matches the 'data' element
   only if it matches all of the patterns. 

I think this means that if I have 

        <rng:element name="duck" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
          <rng:data type="token">
            <rng:param name="pattern">R1</rng:param>
            <rng:param name="pattern">R2</rng:param>
          </rng:data>
        </rng:element>

then the content of <duck> must match both R1 and R2 in order to be
valid. This seems to make a lot of sense. After all, if I had wanted
a string to be a valid <duck> if it matched R1 *or* R2, I could have
written

          <rng:data type="token">
            <rng:param name="pattern">(R1)|(R2)</rng:param>
          </rng:data>


But in W3C XML Schema things seem a lot less clear, although this may
be because I am close to the furthest thing there is from an expert.
I was referred to section 4.3.4.3 of the spec[2]. I had never heard
of, let alone read, 4.3.4.3 before today. But upon reading it, I have
to admit I don't quite understand what it means, and whether or not
it has any significance with respect to RelaxNG validation. (I
suspect not.)

The text of 4.3.4.3 seems problematic.

   If multiple <pattern> element information items appear as
   [children] of a <simpleType>, the [value]s should be combined as
   if they appeared in a single regular expression as separate
   branches.

First, I am under the (perhaps erroneous) impression that a <pattern>
element can not be the child of a <simpleType> element. Although
perhaps the infoset definition of "children" includes descendants? (I
don't think it does -- I had thought "appearing immediately within
the current element" meant child, not descendant.)

Second, the idea seems unhelpful. If I wanted two regular expressions
R1 and R2 to appear in a single regular expression as separate
branches, I could have just written "R1|R2", no? So my gut instinct
is that this rule isn't useful, but I may be missing something.
(E.g., perhaps this is a general idea which, although not very useful
with regular expressions, is expected to be useful with some future
structures not yet devised?)

The note attached to 4.3.4.3 says

   ... pattern facets specified on the same step in a type derivation
   are ORed together, while pattern facets specified on different
   steps of a type derivation are ANDed together.

but I have yet to really figure out what a "step" is. However,
playing around a bit with the output of `trang`[3] is potentially
very instructive.

The following is the above RelaxNG schema fragment transformed into
W3C Schema; in the one test I performed (using xmllint) it validated
as I wanted: the contents of <duck> must match both R1 and R2.

  <xs:element name="duck">
    <xs:simpleType>
      <xs:restriction>
        <xs:simpleType>
          <xs:restriction base="xs:token">
            <xs:pattern value="R1"/>
          </xs:restriction>
        </xs:simpleType>
        <xs:pattern value="R2"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:element>

A minor change, as follows, caused strings matching either R1 or R2
to be considered valid.

  <xs:element name="duck">
    <xs:simpleType>
      <xs:restriction>
        <xs:simpleType>
          <xs:restriction base="xs:token">
            <xs:pattern value="R1"/>
            <xs:pattern value="R2"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:restriction>
    </xs:simpleType>
  </xs:element>

My instinct is that this could be simplified to

  <xs:element name="duck">
    <xs:simpleType>
      <xs:restriction base="xs:token">
        <xs:pattern value="R1"/>
        <xs:pattern value="R2"/>
      </xs:restriction>
    </xs:simpleType>
  </xs:element>

without any change to the set of documents that would be considered
valid. 

Have I got any of this right?

Note
----
[1] Which I found at http://relaxng.org/xsd-20010907.html; it is
    linked to from the main RelaxNG home page.
[2] http://www.w3.org/TR/xmlschema-2/#src-multiple-patterns
[3] Version 20030619.

Received on Saturday, 30 December 2006 11:46:39 UTC