[Fwd: LC comments on SOAP 1.2 adjuncts: re SOAP Encoding (re URIs, test suite, SOAP Data Model Schemas?)]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


FYI, I just sent in the following as last call comment on SOAP 1.2.

I'll carry on with the implementation report anyway, as part of
my SemWeb work, though dissapointed that I didn't get to do a more
explicitly implementation-led LC report.

Hope it's useful anyway...

Dan

- -------- Original Message --------
Subject: LC comments on SOAP 1.2 adjuncts: re SOAP Encoding (re URIs,
test suite, SOAP Data Model Schemas?)
Date: Fri, 19 Jul 2002 23:24:28 +0000
From: Dan Brickley <danbri@w3.org>
To: xmlp-comments@w3.org
CC: danbri@w3.org


Hi

Here are my last call review comments on the SOAP Encoding and Data
Model portion of the 1.2 spec, in document order. I should prefix by
saying that while I jump straight in with the nit-picking, this document
is a huge improvement on the previous version. Nice work! :)

Summary:
Reading back, my comments are variations on a theme: the SOAP Encoding
and its Data Model would benefit from a more explicit account
of the mechanisms by which node and edge types for use in SOAP graphs
might be defined. There are a few places where the use of URIs might
make it easier for other successor specs to flesh out such details (eg.
URIs for kinds of edge label, for nodes, for node types).

URIs:
I specifically request one change to the spec in the light of
implementation experience with SOAP: please specify a mechanism for
identifying SOAP graph edge labels with URI/URIref names. Identifying
nodes and their types would be useful, but identifying and describing
SOAP edge labels is (in my database-oriented implementation) critical.
We have a number of other activites at W3C that could support the richer
description of SOAP graph edge labels: Web Service Description effort,
RDF and RDF Schema, as well as the Web Ontology work. Providing
URI/URIref names for graph edges is a very minimalistic hook that will
make integration with such efforts cheaper and simpler, without adding
to the implementation burden for SOAP implementations.

Test cases / machine checkable test suite:
I don't comment on the fine-grained detail of the SOAP Encoding itself,
except to say the following: please seriously consider creating a
machine-usable test suite for this work. Defining a graph encoding
syntax in XML is a  slippery task, and it is easy to make mistakes, both
in the specification and in implementations. Apparent interop amongst
deployed SOAP toolkits may reflect shared understanding in that part of
the Web community as much as it reflects precision in the formal
specification of the encoding rules. As this work moves into the wider
Web community, we'll likely see more unexpected corner cases. Historical
note: this happened with RDF and the RDF/XML graph encoding syntax. We
had to clean up RDF's graph encoding rules post-REC. As a result of that
experience, we have reworked the RDF specs to use a more mathematically
precise account of the abstract graph model, accompanied by a
machine-processable set of test cases. See
http://www.w3.org/RDF/ for details. I urge the XMLP WG to gather similar
test cases before proposing SOAP 1.2 goes to REC. It's very easy to get
bugs in XML graph encoding rules. Having a test suite offers very useful
    protection against this.

Comments follow below, intersperced with excerpts from the spec.

These specific requests, queries and comments aside, this is really
promising, interesting and worthwhile work, and the spec is much clearer
than the version I reviewed last year. The effort that XMLP have put
into this is really showing and is much appreciated by those of us at
the receiving end of the specs.

cheers,

Dan

(RDF Core WG co-chair; RDF IG chair)




http://www.w3.org/TR/2002/WD-soap12-part2-20020626
- -> http://www.w3.org/TR/soap12-part2/#datamodel
[[
2. SOAP Data Model

The SOAP Data Model represents application-defined data structures and
values as a directed edge-labeled graph of nodes. Components of this
graph are described in the following sections.
]]

The concept of 'application-defined' is somewhat unclear: are these data
structures defined by the producers of the data, by consumers? what form
do these definitions take? Might we expect to read a schema definition
(W3C XML Schema? RELAX-NG?) that made such definitions explicit, or are
the definitions expected to be implicit. For example, if I deploy a
Java-based SOAP service, my application-defined data structures might be
described in terms of Java, yet exposed to the world through the SOAP
Encoding Data Model.

Readers of 1.2p2 could reasonably ask: 'what technology can I use to to
expose my application-defined data structuring conventions? The SOAP
1.2p2 Encoding explains how to expose instance data, but gives little
account of how the underlying principles that tell us whether or not a
particular SOAP graph meets the 'application-defined data structures'
for a given service. Is there an expectation that technology will evolve
to fill this gap? (a SOAP Encoding Schema Language has been mentioned in
some discussions on xml-dist-app and www-ws-desc). If so, please make
this expectation clearer in the specification (there is an aside later
in the spec, but it isn't very detailed). If not, please note that SOAP
1.2 does _not_ specify any mechanisms by which applications which use
SOAP Encoding can describe the SOAP Encoded data structures they
understand.


[[
The purpose of the SOAP Data Model is to provide a mapping of non-XML
based data to some wire representation.
It is important to note that use of the SOAP Data Model, the
accompanying SOAP Encoding (see 3. SOAP Encoding), and/or the SOAP RPC
Representation (see 4. SOAP RPC Representation) is OPTIONAL.
Applications which already model data in XML, for example using W3C XML
Schema [4],[5], may not need to use the SOAP Data Model.
]]

As an introduction to the role of the SOAP Data Model, this could be
clearer. It explains that one might take the Schema-based approach, or
that one might take the SOAP Encoding approach, but offers little to
motivate either decision. For a fresh application with no existing
commitment to a Schema-based approach, the specification currently
offers little advice to help SOAP adopters choose which path to take.
Are there identifiable benefits for using SOAP Encoding over a Schema
approach? Perhaps (for example) that Web services can be deployed faster
using object-to-XML encodings than through hand-crafting an XML Schema?
The current text doesn't really sell us on the utility of SOAP Encoding;
on the contrary, it has a somewhat wary, cautious tone, yet doesn't
provide technical details on the tradeoffs. If you could add 2-3  bullet
points to aid SOAP adopters make an informed decision here, that might help.


[[
2.1 Graph Edges
An edge MAY originate and terminate at the same graph node.
]]

addition clarification / test case:

May a graph contain more than one edge with the same originating and
terminating node? (and can such a thing be serialised? in the current
Encoding rules? in other hypothetical encodings?)


[[
   The outbound edges of a given graph node MAY be distinguished by label
or by position, or both. Position is a total order on such edges; thus
any outbound edge MAY be identified by position.
]]

This is a bit confusing. Whose freedom does the 'MAY' refer to?
Consumers of the data? Or definers of a SOAP Data Model-based
application data formats? (see above re Schema languages). The notion of
'position' is introduced with reference to 'such edges'. But which ones?
All of them, since 'any outbound edge MAY be identified by position'?
Are there edge types for which position is irrelevant? (does 'position
relate to 'document order' in the concrete XML Encoding of the data
model?). I'm not sure I understand this paragraph enough to comment
sensibly.


[[
2.1.1 Edge labels

An edge label is an XML Schema Qualified Name (see XML Schema Part 2:
Datatypes)
]]

Spec change request:
Please specify an algorithm (for example, simple concatenation) by which
SOAP graph edge types (labels) can be named using URI/URIref syntax.
This will make it much easier for out-of-band metadata, including but
limited to RDF/XML metadata, to provide further information about the
kinds of edges deployed in SOAP Encoding applications.

For example, I have a SOAP encoding application (using SOAP 1.1, being
upgraded...) in which the serialised objects represent software
packages. It uses edge labels such as 'ownerMailbox', 'homepage' etc. If
these had URIs, we could write external RDF/XML descriptions about those
edge labels, for example mapping to other SOAP Data Model constructs
from similar applications created elsewhere, or specifying mathematical
characteristics of the graph (eg. that certain edge labels have an 'at
most one' semantic, a characteristic that can support graph merging
algorithms and hence Web Service aggregation).

(more details of this on request... I want to get these comments in
before last call closes or would provide examples from the implementation)

question:
What is the relationship between node types and edge label types in SOAP
encoding? Can they be mixed freely? Can I use node types defined
(somehow...) by one application, with instances of that node using edge
labels drawn in multiple other schemas? Are there any rules constraining
the sensible combinations of node and edge types.

Specifically, does the type of a node determine the edges that be
attached to it? Does each kind of edge label have node types that they
can point to and from?

Implementor feedback: I am storing and merging de-serialised SOAP
Encoding messages in a database system. To implement, I had to assume an
answer to these questions. I assumed that the SOAP Data Model allowed
namespace mixing amongst types and edges, and that node types do not
dictate the edge types for a node.

[[
2.2 Graph Nodes
[...]
Both types of graph node have an optional unique identifier of type ID
in the namespace named "http://www.w3.org/2001/XMLSchema".
]]

Since the SOAP Data Model is defined in the abstract, separate from any
specific XML (or non-XML) syntactic encoding, it isn't clear why XML's
notion of ID is being used here. The spec says the node has a "unique
identifier", but does not define the scope of this uniqueness. XML IDs
are unique within some document. Is this an implicit constraint on all
SOAP Data Model XML encodings, ie. that we have the rule of one Data
Model graph per XML document? (to avoid unique ID clashes). Is the ID
unique within the scope of one graph, or one encoding as an entire XML
document of such a graph?

request: please allow nodes to be identified by URI/URIref

(same goes for node types btw; I won't recycle this comment for that
part of the spec). so, please allow nodes, and their types, to be
identified by URI/URIref.

The use of a global unique identifier here (ie. URI/URIref) would remove
the question of identifier scope, since in a Web (Service) context, URI
identifiers won't accidentally clash. This might help decouple the
abstract Data Model from the specifics of its XML encoding. It would
also support data merging between SOAP graphs that shared node
identifiers, but that's an added bonus.

[[
2.3 Values
If the labels of a non-terminal graph node's outbound edges are not
unique (i.e. they can be duplicated), the non-terminal graph node is
known as a "generic"
]]

Seems odd. How do we know such things about edge labels? No mechanism
has been described whereby we could acquire such metadata.

[[
Outbound edges of a generic MAY be distinguished by label and/or
position, according to the needs of the application.
]]

Which application? This is even more confusing, unless I'm missing
something. The impression I'm left with is that the meaning of a SOAP
Data Model Graph is rather fluid, and open to competing, rival
interpretations (eg. multiple consumer apps, or creators of namepsaces
used in the encoding, vs creators of services that use those
namespaces). If there was a SOAP Data Model schema language, it would
presumably address constraints such as those described in 2.3. In its
absence, there appears to be no authoritative account of the rules
governing each kind of SOAP Data Model edge label. Section 2.3 should
either be removed or augmented with a description of how (possibly out
of band) metadata might provide such information in a machine-readable
format. Without an account of this, word of mouth seems to be the only
way to acquire such information.



[[
3.1 Rules for Encoding Graphs in XML
]]

This bit of the spec is much improved from the previous WD; thanks!


[[
3.1.4 Computing the Type Name property

Note:

These rules define how the type name property of a graph node in a graph
is computed from a serialized encoding. This specification does not
mandate validation using any particular schema language or type system.
[...]
   However, nothing prohibits development of additional specifications to
describe the use of SOAP with particular schema languages or type systems.
]]

This aside partly addresses some of my questions above. Perhaps it
should have more prominence in the spec, since it (?) relates to edge
types as well as node types, and to general issue of extensibility and
further development of the Web service model.

One clarification request: where it says 'the use of SOAP with
particular schema languages', does this mean 'the use of the SOAP
Encoding Data Model with particular schema languages? ie. are you
leaving open the possibility that Web Services may be able to provide
additional metadata about their use of the Encoding and associated Data
Model? (and relating to edge labels and their characteristics, as well
as node types).

[[
   ... Such additional specifications MAY mandate validation using
particular schema language, and MAY specify faults to be generated if
validation fails. Such additional specifications MAY specify
augmentations to the deserialized graph based on information determined
from such a validation.
]]

This seems rather challenging from an extensibility and future proofing
point of view. If I implement a SOAP 1.2 tookit now, including SOAP
Encoding support, how would such running code know when it had
encountered use of such an 'additional specification'? Is this a
scenario where the SOAP 'mustUnderstand' mechanism should be used? If
deployed 1.2 clients will be ignorant of 1.2++ services that use such
mechanisms, this could cause
problems.




Misc other comments:

I understand SOAP sevices can now be deployed with a GET binding.

This means we can expect to see things like HTML documents hyperlinking
into SOAP services which return SOAP Encoding data graphs.

   - can these by styled with XSLT? eg. a stockticker might
     return XML for SOAP clients, but be XSLT'd into XHTML for humans.
    (ie. is it legal to include stylesheet PIs?)

   - can protocol oriented header information be ommitted? for simple
lookups, we often might want nothing more than the graph data itself.
   Would this be legal? Could we use the SOAP mime type?

   - SOAP Encoding is a useful syntax for dumping programmatic objects
into XML. Please consider making it easier for non-protocol uses to be
made of it. I could easily drop object serialisations onto an FTP site,
for example. Or deploy them on a normal HTTP server using normal HTTP
content negotiation, so humans got an HTML version of a document, and
SOAP clients got the graph encoding. The current spec doesn't seem to
anticipate such re-use.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9OKkuPhXvL3Mij+QRAt+NAKCPP42CBcC7d1aUz7tS9HeB0/4sAgCglWyK
AWvFpHBCxnMqSL0utaWfNW4=
=jjPM
-----END PGP SIGNATURE-----

Received on Friday, 19 July 2002 19:44:46 UTC