Re: XSD validation, ambiguous root XML instance element from C. M. Sperberg-McQueen on 2020-11-05 (xmlschema-dev@w3.org from November 2020)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 5 Nov 2020 08:00:58 -0700
To: Mukul Gandhi <gandhi.mukul@gmail.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-Id: <7D1FED07-FFCF-41A9-AA0F-22E984E92A4B@blackmesatech.com>
> On 4,Nov2020, at 10:53 PM, Mukul Gandhi <gandhi.mukul@gmail.com> wrote:
> 
> 
> Hi all,
>     With respect to the discussion currently going on within this thread, I'd like to share following thoughts about XSD (broken into 2 cases A & B),
> 
> (A)
> 
> Let's say that, following is a given XSD document (and this is the only XSD document, that available for an XML document validation),
> 
> <?xml version="1.0"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
>     
>      <xs:element name="m"/>
>    
>      <xs:element name="n"/>
>   
> </xs:schema>
> 
> According to above mentioned XSD document, following two XML instance documents would be valid,
> 
> 1)
> <m/>
> 
> 2)
> <n/>

Yes and no.  That is, under some circumstances what you say is true, and under some circumstances it is false.

It is a consequence of the way schema-validity assessment is defined in the spec that the validation result on the two documents you give depends not just on the document and the schema but also on the node chosen as the validation root, , on the validation approach, and (for some approaches) the declaration or definition stipulated as applicable to the validation root.

For simplicity, I assume in what follows that the outermost element in the document is chosen as the validation root.

For type-driven validation with xsd:anyType or with xsd:string or with any simple type for which the empty string is in the lexical space of the type, both 1 and 2 have [validity] = valid and [validation attempted] = full.

For type-driven validation with other types in the schema (e.g. xsd:date), both 1 and 2 have have [validity] = invalid and [validation attempted] = full.

For element-driven validation using the element declaration for element ‘m’, both 1 and 2 have [validation attempted] = full, 1 has [validity] = valid, and 2 has [validity] = invalid.  For element-driven validation using the element declaration for element ’n’, 1 is invalid and 2 is valid, and in both cases [validity attempted] = full.

For attribute-driven validation using any of the built-in declaration for xsi:type, xsi:nil, xsi:schemaLocation, or xsi:noNamespaceSchemaLocation, neither 1 nor 2 is valid (no element is valid against an attribute declaration).  Both elements will have [validation attempted] = full and [validity] = invalid.

For lax wildcard validation and strict wildcard validation, both documents will have [validity] = valid, [validation attempted] = full.

You don’t consider the case of a document of the form 

    <p/>

but it may be worth considering.  Its behavior for type-driven and attribute-driven validation is as for <m/> and <n/> and it will be invalid against either element declaration in the schema.  For lax wildcard validation it will have [validation attempted] = none and [validity] = notKnown.  For strict wildcard validation, the results will be the same as for lax wildcard validation; the difference in terminology expresses only a difference in the expectations of the person or process invoking validation: the term ‘lax wildcard validation’ suggests that a [validity] value of valid or notKnown is acceptable, while a value of invalid is not acceptable, while ’strict wildcard validation’ suggests that [validity] = notKnown is not acceptable to the caller.

> 
> (B)
> 
> And, if following is an XSD document (and this is the only XSD document, that available for an XML document validation),
> 
> <?xml version="1.0"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
>     
>      <xs:element name="m"/>
>   
> </xs:schema>
> 
> following is the only XML instance document (its my personal choice, to ignore XML child elements and attributes), that would be valid according to above mentioned XSD document,
> 
> <m/>
> 
> The XSD case (B) above, effectively makes a constraint that, XML element name "m" can be the only XML instance document root element name.
> 
> The XSD case (A) mentioned above, allows for both "m" and "n" to be valid XML instance document root element names.

You seem to be assuming that the processor performs lax (or strict) wildcard validation starting at the outermost element of a document. There are processors which do so by default; there may be processors which cannot be invoked any other way.  But that is a property of the processor, not of XSD schemas or XSD schema validity assessment.

My apologies for the profusion of technical detail here.  I think a case could be made that XSD would be more usable if the spec had provided a default meaning for the phrase “document D is valid/invalid/undecided against schema S”, since that is what many (perhaps most) users of any schema technology will want to say about their documents and schemas.  It has not been helpful to require, as the XSD spec does, that any meaningful statement about the schema validity of a document specify the validation approach, the validation root, and in some cases also the stipulated type definition, element declaration, or  attribute declaration.  But that, as they say, is blood under the bridge.


********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************
Received on Thursday, 5 November 2020 15:01:18 UTC