RE: WSD WG requests to XML Schema WG from C. M. Sperberg-McQueen on 2004-02-26 (www-ws-desc@w3.org from February 2004)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: 25 Feb 2004 17:04:40 -0700
To: Matthew Fuchs <matt@westbridgetech.com>
Cc: 'David Orchard' <dorchard@bea.com>, w3c-xml-schema-ig@w3.org, www-ws-desc@w3.org
Message-Id: <1077753879.2489.58.camel@localhost>
On Fri, 2004-02-20 at 17:38, Matthew Fuchs wrote:
...

> However, one can consider the fact that nameType1 is not final as an
> assertion that it is "suffix open".  The xsi:type attribute says
> that the content is _exactly_ the content allowed by the named type,
> which includes a suffix of the defined type.  Alternatively, from
> your perspective, xsi:type is a claim that:
> 1) the named type is a subtype of the defined type.
> 2) the content is prefixed by the content of the defined type.
> 
> Certainly you can verify 2, and if you know the named type, then you
> can validate it's relationship with the defined type and validate
> the full content.
> 3) if you know the named type you can validate it.
> 4) if you don't know the named type you can validate a prefix.
 
There is one point that may need attention here.  It's been on my mind
for a long time but I haven't gotten around to bringing it up.

Let us consider the following case.  We have

   type T, defined as a sequence of elements A, B, C. 
   element F, defined as having type T

and the instance document has an

   element E, which is an instance of F, with an xsi:type value of T2.

We know that for any clean schema (I use the term 'clean' as it is
defined in [1]) which obeys the constraints on schema components and
against which the document is fully valid, T2 will either be derived
from T by restriction or by extension.  If we are validating with a
schema that contains T but not T2, then we can, as Matthew points out,
optimistically process the instance of F.  Either T2 is a restriction
of T, in which case the element instance will be valid against T, or
else (unless it's a botched derivation), T2 is an extension of T, in
which case we can parse a prefix of the instance.  In practice, here
we'd expect to be able to parse an A, a B, and a C at the beginning of
E's children.  Allow me to use the names A, B, and C for the children
of E in the document instance in spite of the risk of ambiguity; if
anyone is confused, I promise to reformulate with different names.

So far, so good, and I think I have done nothing more than paraphrase
what Matthew said above with some extra letters to help me below.
(Correct me if I've botched something.)

The question is: under what circumstances does the PSVI provide the
downstream app with which bits of the following information?

  (a) E was declared as having type T
  (b) T2 is derived from T
  (c) E was validated against T
  (d) the A in the document instance is an instance of element type A
  (e) the B in the document instance is an instance of element type B
  (f) the C in the document instance is an instance of element type C

I believe that (a) is in the PSVI if and only if the PSVI exposes the
entire component set.  

I believe that (b) will never be in the PSVI (if we assume the PSVI
has only the information we define for it -- obviously nothing can
prevent an implementation from adding more information), though a
smart application may make the inference.

I believe that (c) will never be in the PSVI for a 1.0 processor, but
that that should probably change in 1.1.  A 1.0 processor isn't
allowed to validate E against T because clause 4 of Validation Rule:
Element Locally Valid (Element) requires that the type named in the
xsi:type attribute be used instead.

[I suppose that a processor could say "if you use the '--fallback-mode
= dogged' flag, then if I can't find the type named in xsi:type, I'll
start a new validation episode for that element, using the declared
type as the 'stipulated type' mentioned in section 5.2 and in clause
1.2.1.1 of Validation Rule: Schema-Validity Assessment (Element)", but
I haven't seen this documented as an option on most schema processors
I've looked at.]

If the processor doesn't try to validate E against T, then (d), (e),
and (f) will be present in the PSVI if and only if:

  - the parser exercised its option to perform lax assessment of E by
    validating E against the ur-type definition, and
  - element types A, B, and C are top-level element types, not local
    to complex type T.  If only some of A, B, and C are top-level,
    then only some of (d), (e), and (f) will be present in the PSVI.

I don't think this will surprise anyone.

What I did find surprising when I first encountered it is the idea
that if we did ask a conforming processor to validate E against T, and
E turned out to contain the sequence A, B, C, X, then (d), (e), and
(f) would be present in the PSVI if and only if the same conditions
apply as described in the previous paragraph.  That is:

  Element declarations local to a type T are used only to validate
  children of a valid instance of T.  

If E is not a valid instance of T, then elements local to T are not
used to validate any children of E.  I infer this from the fact that a
local element declaration D is used to validate an element instance I
only when D is the context-determined declaration for I.  And
context-determined declarations are only specified in cases of
successful matches of a document instance against the particles of a
content model.  If the sequence doesn't match the content model, then
(I believe) our spec doesn't say how to calculate a context-determined
declaration, and the only way to validate A, B, and C is by falling
back to lax assessment, which means local declarations will never be
used to validate A, B, or C.

If my interpretation is correct, it seems to mean that if the schema
author made the declarations of A, B, and C local to type T, then
we'll never get the type information we'd like for A, B, and C unless
E turns out to be valid against T.

This may be something we should revisit in the context of fallback
behavior when the schema in hand and the document are at loggerheads.

-Michael


[1]
http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2003Jun/att-0093/terminology.proposal.20030411.html
Received on Wednesday, 25 February 2004 19:06:34 UTC