SOAP Encoding multistructs from Jacek Kopecky on 2002-02-20 (xml-dist-app@w3.org from February 2002)

From: Jacek Kopecky <jacek@systinet.com>
Date: Wed, 20 Feb 2002 17:38:07 +0100 (CET)
To: <xml-dist-app@w3.org>
Message-ID: <Pine.LNX.4.33.0202201735540.3513-100000@mail.idoox.com>
 Hi all. 8-)
 This is an issue that was brought up a few times but never 
really discussed.
 In SOAP Encoding (and the Data Model) we have the notion of a 
compound type which contains some members. The members may be 
accessed via their names or ordinal position or both.
 We have three or four ways of accessing the members:
 A) 1) by name, 2) by position, 3) by both;
 B) 1) by name, 2) by position, 3) by name and then by position,
4) by position and then by name.

 I assume B is the case because IMO there are enough differences
between B3 and B4 to make them distinct and I claim that B3
together with B4 cover all the useful cases of A3 and that there
is no overlap between them.  The "useless" cases of A3 not
covered by B3 and B4 are the cases where the application changes
its approach significantly and arbitrarily while processing a
single multistruct.

 In B1, the order of members in the XML serialization is 
completely insignificant, the names carry information.
 In B2, the names of members in the XML serialization are 
completely insignificant, the order carries information.
 In B3, the relative order of two elements with the same name is
significant, while the relative order of two elements with
different names is disregarded; the names are significant.
 In B4, the order of all the elements is significant, so are all 
the names.

 Let's see an example of a serialized multistruct:

   <multistruct>
     <a>1</a>
     <b>2</b>
     <a>3</a>
     <c>4</c>
     <a>5</a>
   </multistruct>

 The difference between B3 and B4 is in the order of choosing by
the position and by the name if a service wants a member with the
name 'a' and position 2.
 B3: choose all 'a's, then choose the second one. Result:  
<a>3</a>.
 B4: choose the second member (<b>2</b>) and ensure it's an 'a'. 
(Actually, a more real scenario would be to get the second 
member, choose the action based on its name, then process the 
value.)

 My opinion is that in B3, the multistruct would be better 
described as a structure containing array of 'a's, array of 'b's 
and an array of 'c's.
 In B4, the natural mapping of the data (on the data-model 
level) would be to an array of tuples {name, value}.

 Both remappings would mean complicating the XML serialization
but the data would be so much more clear about their meaning.  
(Ultimately this is the same for representation of sparse arrays
and partially transmitted arrays and references to attachments or
other stuff, since we removed the incomplete arrays and hrefs.)

 Asir's given me the argument that with the current syntax (same
for B3 and B4) the receiver can choose to approach the data
differently from how the sender approaches them. But this is
hairy because if the sender has data modeled as B3, it will view
my first example and the following one as equal and may choose a
random one, whereas a B4 receiver will see them as different and
will possibly treat them differently with different results -
how's that for interoperability? It's like if an application was
free to treat a struct like an array - same ordering issues.
 The second example:
   <multistruct>
     <a>1</a>
     <a>3</a>
     <a>5</a>
     <b>2</b>
     <c>4</c>
   </multistruct>

 Here follow my proposals:

 I) remove multistructs completely (keeping structs and arrays
whose combinations can be used to model any B3 and B4
multistructs and arguably any A3 multistructs) and only allow 
accessing members either by name or by position, not both,

 II) distinguish between B3 and B4 multistructs just like we
distinguish between structs and arrays, for example by mandating
that B3 multistructs have type descended from enc:Multistruct,
and B4 multistructs have type descended from
enc:DocumentOrderStruct.

 I strongly favor the first proposal, I cannot live with keeping
the status quo ("none of the above") proposal. 

 There is a precedent: we were faced with such two choices in the
sparse arrays debate (distinguish the different treatments or 
remove them entirely) and we chose the path of simplification - 
removal.

 Best regards,

                   Jacek Kopecky

                   Senior Architect, Systinet (formerly Idoox)
                   http://www.systinet.com/
Received on Wednesday, 20 February 2002 11:38:11 UTC