LC comments on SOAP 1.2 adjuncts: re SOAP Encoding (re URIs, test suite, SOAP Data Model Schemas?)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi

Here are my last call review comments on the SOAP Encoding and Data 
Model portion of the 1.2 spec, in document order. I should prefix by 
saying that while I jump straight in with the nit-picking, this document 
is a huge improvement on the previous version. Nice work! :)

Summary:
Reading back, my comments are variations on a theme: the SOAP Encoding 
and its Data Model would benefit from a more explicit account
of the mechanisms by which node and edge types for use in SOAP graphs 
might be defined. There are a few places where the use of URIs might 
make it easier for other successor specs to flesh out such details (eg. 
URIs for kinds of edge label, for nodes, for node types).

URIs:
I specifically request one change to the spec in the light of 
implementation experience with SOAP: please specify a mechanism for 
identifying SOAP graph edge labels with URI/URIref names. Identifying 
nodes and their types would be useful, but identifying and describing 
SOAP edge labels is (in my database-oriented implementation) critical. 
We have a number of other activites at W3C that could support the richer 
description of SOAP graph edge labels: Web Service Description effort, 
RDF and RDF Schema, as well as the Web Ontology work. Providing 
URI/URIref names for graph edges is a very minimalistic hook that will 
make integration with such efforts cheaper and simpler, without adding 
to the implementation burden for SOAP implementations.

Test cases / machine checkable test suite:
I don't comment on the fine-grained detail of the SOAP Encoding itself, 
except to say the following: please seriously consider creating a 
machine-usable test suite for this work. Defining a graph encoding 
syntax in XML is a  slippery task, and it is easy to make mistakes, both 
in the specification and in implementations. Apparent interop amongst 
deployed SOAP toolkits may reflect shared understanding in that part of 
the Web community as much as it reflects precision in the formal 
specification of the encoding rules. As this work moves into the wider 
Web community, we'll likely see more unexpected corner cases. Historical 
note: this happened with RDF and the RDF/XML graph encoding syntax. We 
had to clean up RDF's graph encoding rules post-REC. As a result of that 
experience, we have reworked the RDF specs to use a more mathematically 
precise account of the abstract graph model, accompanied by a 
machine-processable set of test cases. See
http://www.w3.org/RDF/ for details. I urge the XMLP WG to gather similar
test cases before proposing SOAP 1.2 goes to REC. It's very easy to get 
bugs in XML graph encoding rules. Having a test suite offers very useful 
   protection against this.

Comments follow below, intersperced with excerpts from the spec.

These specific requests, queries and comments aside, this is really 
promising, interesting and worthwhile work, and the spec is much clearer 
than the version I reviewed last year. The effort that XMLP have put 
into this is really showing and is much appreciated by those of us at 
the receiving end of the specs.

cheers,

Dan

(RDF Core WG co-chair; RDF IG chair)




http://www.w3.org/TR/2002/WD-soap12-part2-20020626
- -> http://www.w3.org/TR/soap12-part2/#datamodel
[[
2. SOAP Data Model

The SOAP Data Model represents application-defined data structures and 
values as a directed edge-labeled graph of nodes. Components of this 
graph are described in the following sections.
]]

The concept of 'application-defined' is somewhat unclear: are these data 
structures defined by the producers of the data, by consumers? what form 
do these definitions take? Might we expect to read a schema definition 
(W3C XML Schema? RELAX-NG?) that made such definitions explicit, or are 
the definitions expected to be implicit. For example, if I deploy a 
Java-based SOAP service, my application-defined data structures might be 
described in terms of Java, yet exposed to the world through the SOAP 
Encoding Data Model.

Readers of 1.2p2 could reasonably ask: 'what technology can I use to to 
expose my application-defined data structuring conventions? The SOAP 
1.2p2 Encoding explains how to expose instance data, but gives little 
account of how the underlying principles that tell us whether or not a 
particular SOAP graph meets the 'application-defined data structures' 
for a given service. Is there an expectation that technology will evolve 
to fill this gap? (a SOAP Encoding Schema Language has been mentioned in 
some discussions on xml-dist-app and www-ws-desc). If so, please make 
this expectation clearer in the specification (there is an aside later 
in the spec, but it isn't very detailed). If not, please note that SOAP 
1.2 does _not_ specify any mechanisms by which applications which use 
SOAP Encoding can describe the SOAP Encoded data structures they 
understand.


[[
The purpose of the SOAP Data Model is to provide a mapping of non-XML 
based data to some wire representation.
It is important to note that use of the SOAP Data Model, the 
accompanying SOAP Encoding (see 3. SOAP Encoding), and/or the SOAP RPC 
Representation (see 4. SOAP RPC Representation) is OPTIONAL. 
Applications which already model data in XML, for example using W3C XML 
Schema [4],[5], may not need to use the SOAP Data Model.
]]

As an introduction to the role of the SOAP Data Model, this could be 
clearer. It explains that one might take the Schema-based approach, or 
that one might take the SOAP Encoding approach, but offers little to 
motivate either decision. For a fresh application with no existing 
commitment to a Schema-based approach, the specification currently 
offers little advice to help SOAP adopters choose which path to take.
Are there identifiable benefits for using SOAP Encoding over a Schema 
approach? Perhaps (for example) that Web services can be deployed faster 
using object-to-XML encodings than through hand-crafting an XML Schema? 
The current text doesn't really sell us on the utility of SOAP Encoding; 
on the contrary, it has a somewhat wary, cautious tone, yet doesn't 
provide technical details on the tradeoffs. If you could add 2-3  bullet 
points to aid SOAP adopters make an informed decision here, that might help.


[[
2.1 Graph Edges
An edge MAY originate and terminate at the same graph node.
]]

addition clarification / test case:

May a graph contain more than one edge with the same originating and
terminating node? (and can such a thing be serialised? in the current 
Encoding rules? in other hypothetical encodings?)


[[
  The outbound edges of a given graph node MAY be distinguished by label 
or by position, or both. Position is a total order on such edges; thus 
any outbound edge MAY be identified by position.
]]

This is a bit confusing. Whose freedom does the 'MAY' refer to? 
Consumers of the data? Or definers of a SOAP Data Model-based 
application data formats? (see above re Schema languages). The notion of 
'position' is introduced with reference to 'such edges'. But which ones? 
All of them, since 'any outbound edge MAY be identified by position'? 
Are there edge types for which position is irrelevant? (does 'position 
relate to 'document order' in the concrete XML Encoding of the data 
model?). I'm not sure I understand this paragraph enough to comment 
sensibly.


[[
2.1.1 Edge labels

An edge label is an XML Schema Qualified Name (see XML Schema Part 2: 
Datatypes)
]]

Spec change request:
Please specify an algorithm (for example, simple concatenation) by which
SOAP graph edge types (labels) can be named using URI/URIref syntax. 
This will make it much easier for out-of-band metadata, including but 
limited to RDF/XML metadata, to provide further information about the 
kinds of edges deployed in SOAP Encoding applications.

For example, I have a SOAP encoding application (using SOAP 1.1, being 
upgraded...) in which the serialised objects represent software 
packages. It uses edge labels such as 'ownerMailbox', 'homepage' etc. If 
these had URIs, we could write external RDF/XML descriptions about those 
edge labels, for example mapping to other SOAP Data Model constructs 
from similar applications created elsewhere, or specifying mathematical 
characteristics of the graph (eg. that certain edge labels have an 'at 
most one' semantic, a characteristic that can support graph merging 
algorithms and hence Web Service aggregation).

(more details of this on request... I want to get these comments in 
before last call closes or would provide examples from the implementation)

question:
What is the relationship between node types and edge label types in SOAP 
encoding? Can they be mixed freely? Can I use node types defined 
(somehow...) by one application, with instances of that node using edge 
labels drawn in multiple other schemas? Are there any rules constraining 
the sensible combinations of node and edge types.

Specifically, does the type of a node determine the edges that be 
attached to it? Does each kind of edge label have node types that they 
can point to and from?

Implementor feedback: I am storing and merging de-serialised SOAP 
Encoding messages in a database system. To implement, I had to assume an 
answer to these questions. I assumed that the SOAP Data Model allowed 
namespace mixing amongst types and edges, and that node types do not 
dictate the edge types for a node.

[[
2.2 Graph Nodes
[...]
Both types of graph node have an optional unique identifier of type ID 
in the namespace named "http://www.w3.org/2001/XMLSchema".
]]

Since the SOAP Data Model is defined in the abstract, separate from any 
specific XML (or non-XML) syntactic encoding, it isn't clear why XML's 
notion of ID is being used here. The spec says the node has a "unique 
identifier", but does not define the scope of this uniqueness. XML IDs 
are unique within some document. Is this an implicit constraint on all 
SOAP Data Model XML encodings, ie. that we have the rule of one Data 
Model graph per XML document? (to avoid unique ID clashes). Is the ID 
unique within the scope of one graph, or one encoding as an entire XML 
document of such a graph?

request: please allow nodes to be identified by URI/URIref

(same goes for node types btw; I won't recycle this comment for that 
part of the spec). so, please allow nodes, and their types, to be 
identified by URI/URIref.

The use of a global unique identifier here (ie. URI/URIref) would remove 
the question of identifier scope, since in a Web (Service) context, URI 
identifiers won't accidentally clash. This might help decouple the 
abstract Data Model from the specifics of its XML encoding. It would 
also support data merging between SOAP graphs that shared node 
identifiers, but that's an added bonus.

[[
2.3 Values
If the labels of a non-terminal graph node's outbound edges are not 
unique (i.e. they can be duplicated), the non-terminal graph node is 
known as a "generic"
]]

Seems odd. How do we know such things about edge labels? No mechanism 
has been described whereby we could acquire such metadata.

[[
Outbound edges of a generic MAY be distinguished by label and/or 
position, according to the needs of the application.
]]

Which application? This is even more confusing, unless I'm missing 
something. The impression I'm left with is that the meaning of a SOAP 
Data Model Graph is rather fluid, and open to competing, rival 
interpretations (eg. multiple consumer apps, or creators of namepsaces 
used in the encoding, vs creators of services that use those 
namespaces). If there was a SOAP Data Model schema language, it would 
presumably address constraints such as those described in 2.3. In its 
absence, there appears to be no authoritative account of the rules 
governing each kind of SOAP Data Model edge label. Section 2.3 should 
either be removed or augmented with a description of how (possibly out 
of band) metadata might provide such information in a machine-readable 
format. Without an account of this, word of mouth seems to be the only 
way to acquire such information.



[[
3.1 Rules for Encoding Graphs in XML
]]

This bit of the spec is much improved from the previous WD; thanks!


[[
3.1.4 Computing the Type Name property

Note:

These rules define how the type name property of a graph node in a graph 
is computed from a serialized encoding. This specification does not 
mandate validation using any particular schema language or type system.
[...]
  However, nothing prohibits development of additional specifications to 
describe the use of SOAP with particular schema languages or type systems.
]]

This aside partly addresses some of my questions above. Perhaps it 
should have more prominence in the spec, since it (?) relates to edge 
types as well as node types, and to general issue of extensibility and 
further development of the Web service model.

One clarification request: where it says 'the use of SOAP with 
particular schema languages', does this mean 'the use of the SOAP 
Encoding Data Model with particular schema languages? ie. are you 
leaving open the possibility that Web Services may be able to provide 
additional metadata about their use of the Encoding and associated Data 
Model? (and relating to edge labels and their characteristics, as well 
as node types).

[[
  ... Such additional specifications MAY mandate validation using 
particular schema language, and MAY specify faults to be generated if 
validation fails. Such additional specifications MAY specify 
augmentations to the deserialized graph based on information determined 
from such a validation.
]]

This seems rather challenging from an extensibility and future proofing 
point of view. If I implement a SOAP 1.2 tookit now, including SOAP 
Encoding support, how would such running code know when it had 
encountered use of such an 'additional specification'? Is this a 
scenario where the SOAP 'mustUnderstand' mechanism should be used? If 
deployed 1.2 clients will be ignorant of 1.2++ services that use such 
mechanisms, this could cause
problems.




Misc other comments:

I understand SOAP sevices can now be deployed with a GET binding.

This means we can expect to see things like HTML documents hyperlinking 
into SOAP services which return SOAP Encoding data graphs.

  - can these by styled with XSLT? eg. a stockticker might
    return XML for SOAP clients, but be XSLT'd into XHTML for humans.
   (ie. is it legal to include stylesheet PIs?)

  - can protocol oriented header information be ommitted? for simple 
lookups, we often might want nothing more than the graph data itself.
  Would this be legal? Could we use the SOAP mime type?

  - SOAP Encoding is a useful syntax for dumping programmatic objects 
into XML. Please consider making it easier for non-protocol uses to be 
made of it. I could easily drop object serialisations onto an FTP site, 
for example. Or deploy them on a normal HTTP server using normal HTTP 
content negotiation, so humans got an HTML version of a document, and 
SOAP clients got the graph encoding. The current spec doesn't seem to 
anticipate such re-use.








































-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9OJ+xPhXvL3Mij+QRApCwAKCY/R3zHaToUVcomexViFiSc2nfIgCeLsuA
C0FXYn7CI1gUZADMEgN343U=
=EYCC
-----END PGP SIGNATURE-----

Received on Friday, 19 July 2002 19:04:21 UTC