lax wildcards [was Re: variable element names] from Morris Matsa on 2002-11-19 (xmlschema-dev@w3.org from November 2002)

From: Morris Matsa <mmatsa@us.ibm.com>
Date: Tue, 19 Nov 2002 18:12:00 -0500
To: ht@cogsci.ed.ac.uk (Henry S. Thompson)
Cc: "Bob Schloss" <rschloss@us.ibm.com>, "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Message-ID: <OF7BF12004.905C6125-ON85256C76.007240C3@pok.ibm.com>
Henry, it seems that you might have a different understanding of lax
wildcards than we read from the spec, given your assumption at the end of
your mail.  We've always been unsure about this, so I'd like to use this
occasion to ask.  First, I'd like to be specific about the example, so
here's a schema:

<xs:element name="parent">
  <xs:complexType>
    <xs:sequence>
      <xs:any processContents="lax"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="x" type="xs:positiveInteger"/>
<!-- There is no declaration for element "var" -->


A valid instance looks like this:

<parent>
  <var>
    <x>123</x>
  </var>
<parent>

The question comes up with this instance:

<parent>
  <var>
    <x>-123</x>
  </var>
<parent>


Is the "parent" element valid?

Your note seems to indicate that it is not valid (x not a positiveInteger),
and XSV agrees with you (surprise), however we read the spec as saying that
it is valid.  Here's why we read it that way:

There are three validity values in the PSVI: for the "parent" element, the
"var" element, and the "x" element.

First let's make an assumption which we'll actually question a bit at the
end:  The element "x" is invalid.  Given that the type is positiveInteger
and the value negative, it seems pretty clear that if assessed the validity
will be invalid.

Next let's make another assumption:  The "var" element, since it is
matching a lax wildcard, is laxly assessed.  Again, we'll question this
assumption at the end.  [The key here is that the context-determined
declaration is empty because of the wildcard validation rule.  Complicated
details aside, it might seem natural that this is the outcome:  lax
wildcards are assessed laxly.]

Given our assumptions, the next goal is to figure out the validity value in
the PSVI for the element "var".  The rule for filling in this PSVI value is
at [1].  As we assumed that the "var" element was laxly assessed, clearly
it was not strictly assessed (and this we won't question later), so clause
1 does not apply.  Thus, clause 2 sets the value to "notKnown".  (Note that
XSV differs in it's -r PSVI dump, and lists this as "invalid", so perhaps
we've already made a mistake in our analysis.)

Next, let's move on to the "parent" element.  The only thing that would
keep it's validation rule from validating is the wildcard (there are no
ID/IDREF/Identity Constraints/Attributes/etc.), so we can look to the
Wildcard Validation rule.  [2]  This wildcard accepts any namespaces, thus
the rule can not fail.  In fact, given the "any" nature of the wildcard,
the wildcard rule merely sets the value of "context-determined
declaration", in this case to be absent, but has no way to fail.

Next, we consider the assessment of the "parent" element as a whole.  [3]
1.1.1.3.1 and 1.1.1.3.2 are true since {"","parent"} is declared in the
schema.  This makes 1.1.1.3 true, and thus 1.1.1 true.  1.1.2 and 1.1.3 are
true because the Wildcard rule did not fail, as discussed just above from
[2].  Thus 1.1 is true and clause 1 is true as well, which according the
definition just below in the spec [4] means that it has been strictly
assessed.

Finally, let's evaluate the validity value of the "parent" element [1].
Since it was strictly assessed, we use clause 1.  1.1.1.1 is true, and we
assume that 1.1.1 means to say that it is an OR and thus 1.1.1 is true.
1.1.2 is true because, as we derived, the element "var" has validity
"notKnown" and not "invalid".  1.1.3 is true because it's only child, the
"var" element, has a "context-determined declaration" which is absent (set
in the "parent" element's wildcard validation rule, as mentioned.)  Thus,
it is not valued at "mustFind".  Thus, 1.1.3 is true, so 1.1 is true, and
thus the "parent" element's validity is "valid".

Once again, XSV disagrees with us, listing the validity of (parent, var, x)
as (invalid, invalid, invalid) respectively, where we seem to get (valid,
notKnown, invalid) respectively.

Postscript:

I mentioned above that we would 'at the end' question our two assumptions.
While it seems clear that the "var" element is not strictly assessed, as
explained above, is it be laxly assessed?

The relevant quote from the spec seems to be the line just under [4], where
the spec says:  If the item cannot be "strictly assessed", because neither
clause 1.1 nor clause 1.2 above are satisfied, [Definition:]  an element
information item's schema validity may be "laxly assessed" if its "
context-determined declaration" is not "skip" by "validating" with respect
to the "ur-type definition" as per "Element Locally Valid (Type) (3.3.4)"

The key word in this sentence for us is "may" in part of "may be laxly
assessed", which does not say "must".  Thus, it seems that a schema-aware
parser could not laxly assess the "var" element at all.  This would not
affect the validity of the "var" element by our above analysis which would
still be "notKnown" because [1] is consistent as long as the element was
not strictly assessed, laxly assessed and not-assessed are the same as far
as validity is concerned.  It would also not affect the validity of the
"parent" element which is "valid" for the same reasons.  However, if the
"var" element is not even laxly assessed, then the "x" element would not
even be recursively assessed, and thus not end up with validity "invalid".
If we're reading this right, it would be optional for a processor to not
identify "x" as invalid, but only optional, and either way not affect the
validity of the "parent" element, so it's really a separate clarification
question.

I think that's it for now.  We're ready for you to point out the constraint
that we're missing that avoids all these problems.  Depending on your
answer we might ask a certain follow-up question.

[1]
http://www.w3.org/TR/xmlschema-1/#section-Element-Declaration-Information-Set-Contributions
[2] http://www.w3.org/TR/xmlschema-1/#cvc-wildcard
[3] http://www.w3.org/TR/xmlschema-1/#cvc-assess-elt
[4] http://www.w3.org/TR/xmlschema-1/#key-sva


ht@cogsci.ed.ac.uk (Henry S. Thompson)@w3.org on 11/19/2002 02:13:13 PM

Sent by:    xmlschema-dev-request@w3.org


To:    Bob Schloss/Watson/IBM@IBMUS
cc:    "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
Subject:    Re: variable element names




I think you've identified a bug in Noah's solution.

The processContents attribute of wildcards is not inherited.  So
given

        <parent>
                <var>
                        <x>123</x>
                        <w><a/></w>
                        <z/>
                </var>
        <parent>

and writing

<xs:element name="parent">
 <xs:complexType>
  <xs:sequence>
   <xs:any processContents="strict"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

in its schema, we would have a requirement that a declaration for the
_var_ element be available.  There's no way to make that requirement
inherited.

So what you actually want is

<xs:element name="parent">
 <xs:complexType>
  <xs:sequence>
   <xs:any processContents="lax"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

<xs:element name="x" type="..."/>
<xs:element name="y" type="..."/>
<xs:element name="z" type="..."/>

By using 'lax' you avoid the (undesirable for your example)
requirement for a declaration for 'var', but because lax validation
_is_ recursive, and declarations for x, y and z _are_ present,
requires that they conform to those declarations.

ht
--
  Henry S. Thompson, HCRC Language Technology Group, University of
          Edinburgh
          W3C Fellow 1999--2002, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
     Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged
 spam]
Received on Tuesday, 19 November 2002 18:21:31 UTC