Re: Issue: SOAP Data Model Schema language may be necessary (related to #231) from Pete Hendry on 2002-07-17 (xml-dist-app@w3.org from July 2002)

From: Pete Hendry <peter.hendry@capeclear.com>
Date: Wed, 17 Jul 2002 14:59:43 +1200
To: Ray Whitmer <rayw@netscape.com>
CC: xml-dist-app@w3.org
Message-ID: <3D34DD9F.5000309@capeclear.com>
Ray Whitmer wrote:

> Pete Hendry wrote:
>
>>> The issue then moved to asking whether all the array member EII 
>>> names have to be the same - that's currently the issue #231.
>>>
>> Why is it the case that these are not the same? Using schema munging 
>> it is generally possible to generate a schema for soap-enc that can 
>> be used in a validating parser, *except* for arrays. If the name was 
>> fixed then this would resolve that problem. Currently there is no way 
>> to munge a soap-enc schema into a validatable schema because of the 
>> array element names being allowed to differ from the schema definition.
>
>
> A schema can be made to describe a legal subset of a SOAP encoding.
>
> This is no different with arrays that with other types.  You are 
> permitted to make a schema that describes an array that has all 
> elements the same.  But you may need to create arrays that have 
> schemas where all elements are not the same name or type.  This is 
> very obvious with non-hierarchical object graphs, to name one case.


In a non-hierarchical model graph the names are either significant in 
which case it is a struct or generic, or they are not (indexed) in which 
case there is no reason not to make them all the same. I don't see why 
this means that array items should not have the same name.

>
> To try and make them match perfectly (not one a subset of the other), 
> even if all children of an array were required to have the same name, 
> the soap specification would have to dictate what name they were all 
> required to have, or a legal encoding could still just as easily 
> violate your schema.


Yes, that is what I am suggesting. The SOAP encoding should mandate the 
name of these items. Whatever the name is, it's scope does not extend 
beyond the schema definition of a particular array so there is no chance 
of conflict in the soap encoding (since the array element itself wraps 
item elements in its scope).

>
> You cannot expect the SOAP and Schema to be identical in what they 
> describe, so you have to take the one which is the subset as the 
> authority if you want both sets of rules to be followed.
>
>> I feel that soap implementors should not have to be concerned with 
>> validating a message that can be described by the schema language 
>> when a validating parser could do it instead. Most toolkits are still 
>> lacking full validation simply because they cannot delegate it to a 
>> parser.
>
>
> Fine, if you restrict encoders to the schema-described subset, as you 
> have to do anyway in a number of instances.


I see little point in an encoding which has a schema binding but 
instance of documents using the encoding do not conform to the schema 
(except with a limited subset which again begs the question of why 
bother defining a Schema binding for the encoding?).

>
> If you force all children of an array to have the same prescribed 
> name, it prevents using tag names that are associated with proper 
> schema types.  Relying on the array item type is outside of schemas 
> and thwarts validation because validation is not aware of it.  Relying 
> on xsi:type is something that should be optional, but this would make 
> it mandatory. Relying on xsi:type thwarts validation because it 
> permits the sender to put whatever he wants into the array, and it 
> will pass validation even if it is totally at odds with the array type.


The name of the actual array element (not the items) is enough to 
identify the element for the parser to validate it. The xsi:type is not 
required and the parser doesn't need to look at itemType (which it can't 
of course). I am suggesting the soap-enc description of an array be 
something like

<xs:group name="Array">
  <xs:sequence>
    <xs:element name="soaparrayitem" form="unqualified" minOccurs="0" 
maxOccurs="unbounded" xsi:type="xsd:anyType" />
  </xs:sequence>
</xs:group>

and a specific definition be

            <xsd:complexType name="ArrayOfstring">
                <xsd:complexContent>
                    <xsd:restriction base="SOAP-ENC:Array">
                        <xsd:sequence>
                            <xsd:element
                                name="soaparrayitem"
                                type="xsd:string"
                                form="unqualified"
                                minOccurs="0"
                                maxOccurs="unbounded" />
                        </xsd:sequence>
                        <attributeGroup ref="enc:arrayAttributes" />
                        <attributeGroup ref="enc:commonAttributes" />
                    </xsd:restriction>
                </xsd:complexContent>
            </xsd:complexType>

The name "tns:ArrayOfstring" in an instance document is all the parser 
needs to know how to validate an instance of this definition and its 
children. This enforces the message format, unlike the current 
definition of arrays which is a free-for-all with item names.

>
> Validation is safer, in my experience, using names than xsi:type, 
> because the content model of the particular array can then describe 
> which types are permitted.  


Yes, exactly my point. With the current schema definition, there is no 
way to mandate what the names should be. An implementation may write any 
names it wants without in any way relating them to the schema defining 
the message and this completely stops parser validation in its tracks.

In fact the current Array encoding is exactly the opposite of what you 
say you want because the names can't be used for validation (because 
they can be anything). Only in the (so far rare) case where the array 
items are written with an element name that resolves to an element 
definition that can be validated can the current approach allow 
validation without looking at the itemType or using xsi:type.

> If you hard-code the names, you throw eliminate possibility of being 
> able to validate the types of the children of an array.  This is just 
> fine, when you are making a specific schema and know the array types, 
> but it is wrong to do it inside the encoding which is supposed to 
> permit non-homogeneous arrays.


Not so as I suggest above. Because the name is defined in the scope of 
the array schema definition, as long as the array instance has a name 
(which it has of course), the parser then knows the type of the child 
elements.

Perhaps I am missing something around how validation proceeds?

Currently we can use a validating parser to validate Literal encoding 
but not soap encoding even though soap encoding has a schema "defining" 
it. This doesn't make sense to me at all.

Pete
Received on Tuesday, 16 July 2002 22:54:33 UTC