Re: Issue: SOAP Data Model Schema language may be necessary (related to #231) from Ray Whitmer on 2002-07-17 (xml-dist-app@w3.org from July 2002)

From: Ray Whitmer <rayw@netscape.com>
Date: Wed, 17 Jul 2002 09:06:42 -0700
To: Pete Hendry <peter.hendry@capeclear.com>
CC: xml-dist-app@w3.org
Message-ID: <3D359612.200@netscape.com>
Pete Hendry wrote:

> Ray Whitmer wrote:
>
>> Pete Hendry wrote:
>>
>>>> The issue then moved to asking whether all the array member EII 
>>>> names have to be the same - that's currently the issue #231.
>>>>
>>> Why is it the case that these are not the same? Using schema munging 
>>> it is generally possible to generate a schema for soap-enc that can 
>>> be used in a validating parser, *except* for arrays. If the name was 
>>> fixed then this would resolve that problem. Currently there is no 
>>> way to munge a soap-enc schema into a validatable schema because of 
>>> the array element names being allowed to differ from the schema 
>>> definition.
>>
They may be the same, but they are not forced to be the same.  What 
benefit is there in forcing them all to be the same?  I think validation 
issues become far less workable if there is no flexibility in naming 
them differently when they represent different types.

Likewise, there is no rule in SOAP that fixes the value of struct 
children.  If the application chooses to fix them, then it must do so in 
the schema.

As with structs, you can easily declare a schema to fix the names of 
array children in any way that is appropriate.  But it is quite likely 
that it is NOT appropriate for them to all have children of the same 
name, because the types of objects in the arrays are likely to be quite 
different.  In one array, I may wish to only allow enc:int elements as 
children.  In another, I may wish to allow myfoo:birds and 
myfoo:insects.  That is easy to accomplish as long as the schema is free 
to specify what types of children a specifically-occurring array is 
permitted to have, just as the schema is free to specify which accessors 
and types a specifically-occuring struct has even though the encoding 
leaves it wide open.  If, on the other hand, you force all children of 
all arrays to be a single tag, then you prevent the schema from 
validating what types of children it has.  These single, fixed 
child-of-array tags would have to permit AnyType, which is NO benefit, 
from a validation perspective, over just specifying that the children of 
an array are ANY content model.  Please show me a use case where this 
improves validation.

>> A schema can be made to describe a legal subset of a SOAP encoding.
>>
>> This is no different with arrays that with other types.  You are 
>> permitted to make a schema that describes an array that has all 
>> elements the same.  But you may need to create arrays that have 
>> schemas where all elements are not the same name or type.  This is 
>> very obvious with non-hierarchical object graphs, to name one case.
>
>
>
> In a non-hierarchical model graph the names are either significant in 
> which case it is a struct or generic, or they are not (indexed) in 
> which case there is no reason not to make them all the same. I don't 
> see why this means that array items should not have the same name.

I deleted part of my response here as I shortened it here and the result 
did not make much sense.  I believe that in a non-hierarchical 
representation, there are more than one way to serialize the same graph 
and you cannot validate in the first place without making choices, and 
narrowing the freedoms in the serializer, just as you have to do with 
arrays and structs to make them make sense.  Just another case where the 
sender needs to know the subset that will be accepted by the recipient.

> Yes, that is what I am suggesting. The SOAP encoding should mandate 
> the name of these items. Whatever the name is, it's scope does not 
> extend beyond the schema definition of a particular array so there is 
> no chance of conflict in the soap encoding (since the array element 
> itself wraps item elements in its scope).

By this logic, the SOAP encoding should mandate the names of the 
accessors of a struct or you have the same problem.  You have to narrow 
each of these in the Schema, and make what the schema permits a subset 
of what the encoding permits.

> I see little point in an encoding which has a schema binding but 
> instance of documents using the encoding do not conform to the schema 
> (except with a limited subset which again begs the question of why 
> bother defining a Schema binding for the encoding?).

I disagree strongly.  Without a schema to describe the encoding, you 
will not have successful exchanges, except by freakish luck.  This means 
that each end will have to be aware of the schema and what limitations 
it places on structs, arrays, non-hierarchical graphs, etc.

The encoding is still very useful because it makes it easier to 
standardize the binding between the language objects and the elements of 
the message, and the objects of the SOAP processing model

>
>>
>> If you force all children of an array to have the same prescribed 
>> name, it prevents using tag names that are associated with proper 
>> schema types.  Relying on the array item type is outside of schemas 
>> and thwarts validation because validation is not aware of it.  
>> Relying on xsi:type is something that should be optional, but this 
>> would make it mandatory. Relying on xsi:type thwarts validation 
>> because it permits the sender to put whatever he wants into the 
>> array, and it will pass validation even if it is totally at odds with 
>> the array type.
>
>
>
> The name of the actual array element (not the items) is enough to 
> identify the element for the parser to validate it. The xsi:type is 
> not required and the parser doesn't need to look at itemType (which it 
> can't of course). I am suggesting the soap-enc description of an array 
> be something like
>
> <xs:group name="Array">
>  <xs:sequence>
>    <xs:element name="soaparrayitem" form="unqualified" minOccurs="0" 
> maxOccurs="unbounded" xsi:type="xsd:anyType" />
>  </xs:sequence>
> </xs:group>
>
> and a specific definition be
>
>            <xsd:complexType name="ArrayOfstring">
>                <xsd:complexContent>
>                    <xsd:restriction base="SOAP-ENC:Array">
>                        <xsd:sequence>
>                            <xsd:element
>                                name="soaparrayitem"
>                                type="xsd:string"
>                                form="unqualified"
>                                minOccurs="0"
>                                maxOccurs="unbounded" />
>                        </xsd:sequence>
>                        <attributeGroup ref="enc:arrayAttributes" />
>                        <attributeGroup ref="enc:commonAttributes" />
>                    </xsd:restriction>
>                </xsd:complexContent>
>            </xsd:complexType>
>
> The name "tns:ArrayOfstring" in an instance document is all the parser 
> needs to know how to validate an instance of this definition and its 
> children. This enforces the message format, unlike the current 
> definition of arrays which is a free-for-all with item names.

I do not understand why you believe that your above-specified 
declaration of Array is required to produce ArrayOfString.  I think if 
we leave the children of Array unspecified, it is easier to create the 
restriction ArrayOfString you specified above, because we can also use 
global names that have meaning, such as enc:int.

>
>>
>> Validation is safer, in my experience, using names than xsi:type, 
>> because the content model of the particular array can then describe 
>> which types are permitted.  
>
>
>
> Yes, exactly my point. With the current schema definition, there is no 
> way to mandate what the names should be. An implementation may write 
> any names it wants without in any way relating them to the schema 
> defining the message and this completely stops parser validation in 
> its tracks.

Are you saying that with XML Schema it is impossible to take a type 
which permits ANY children and restrict it so that it only accepts 
particular children?

> In fact the current Array encoding is exactly the opposite of what you 
> say you want because the names can't be used for validation (because 
> they can be anything). Only in the (so far rare) case where the array 
> items are written with an element name that resolves to an element 
> definition that can be validated can the current approach allow 
> validation without looking at the itemType or using xsi:type.

That is far from a rare case.  When I have exchanged messages with 
services that return arrays of primitive types, they are generally 
represented using the names provided for that purpose in SOAP 1.1.

> Perhaps I am missing something around how validation proceeds?
>
> Currently we can use a validating parser to validate Literal encoding 
> but not soap encoding even though soap encoding has a schema 
> "defining" it. This doesn't make sense to me at all.

I think validation procedes just fine using a definition similar to the 
one you made above, but without hard-coding the names of the children in 
the abstract supertype.  Both the sender and the receiver have to 
examine the schema, but that is true for more reasons than this.

If it were only possible to create a concrete narrow class by 
hard-coding the names of the elements at the abstract superclass, then 
structs would be in very deep trouble when it comes to validation.

In the case you gave above, I saw no advantage derived by fixing the 
names at the superclass.  I believe it should be possible to make a 
suitably-specific subclass definition without doing that.  It does, on 
the other hand, prevent use of any globally-defined tag names if you fix 
them, which seems to me to be a weakness.

Ray Whitmer
rayw@netscape.com
Received on Wednesday, 17 July 2002 12:07:05 UTC