Possible requirement to update SOAP 1.2 for XML 1.0 5th Edition

The XML Core working group has published a Proposed Edited Recommendation 
(PER) Extensible Markup Language (XML) 1.0 (Fifth Edition).  The major 
change in that edition is the proposal to expand the set of legal XML 
element and attribute names.  Without commenting either for myself or for 
IBM on the merits of this proposal, I note that there appears to be an 
interdependency with the SOAP 1.2 Recommendation.  Specifically, the way 
that SOAP 1.2 guarantees that all nodes agree on what's legal and what's 
not in a SOAP envlope is by reference to XML 1.0 serialization rules. From 
SOAP 1.2 Part 1 Chapter 5 "Message Construct" [2]:

"A SOAP message is specified as an XML infoset whose comment, element, 
attribute, namespace and character information items are able to be 
serialized as XML 1.0. Note, requiring that the specified information 
items in SOAP message infosets be serializable as XML 1.0 does NOT require 
that they be serialized using XML 1.0.  [...] The Infoset Recommendation 
[XML InfoSet] allows for content not directly serializable using XML; for 
example, the character #x0 is not prohibited in the Infoset, but is 
disallowed in XML. The XML Infoset of a SOAP Message MUST correspond to an 
XML 1.0 serialization [XML 1.0]."

In other words, all SOAP nodes must follow the same rules for what's a 
legal envelope, and those rules depend heavily on the well-formedness 
rules for XML 1.0.  Hop by hop, some bindings will actually use the 
obvious XML 1.0 serialization while others may use compressed, encrypted, 
etc. alternatives, but either way there must be nothing in the envelope 
infoset that could not be sent using XML 1.0.  But which edition of XML 
1.0? The last reference in that paragraph is a hyperlink to the 
bibliography.  I think most readers would taking that as applying to the 
first sentence, but it's a bit unclear.  Anyway, it gets a bit worse. When 
you follow the hyperlink to the bibliography you get [3]:

"[XML 1.0]
 
Extensible Markup Language (XML) 1.0 (Fourth Edition), Jean Paoli, Eve 
Maler, Tim Bray, et. al., Editors. World Wide Web Consortium, 16 August 
2006. This version is http://www.w3.org/TR/2006/REC-xml-20060816. The 
latest version is available at http://www.w3.org/TR/REC-xml."

So, SOAP 1.2 explicitly references XML 1.0 4th edition, but then it also 
tells you to go looking for a new one too!  If you believe it's 4th 
edition only, then the new XML 1.0 PER has no impact, except insofar as 
you might sometime decide to update the Recommendation to explicitly point 
to 5th, should that be your wish (that will, of course, raise some 
interoperability concerns, since for the first time SOAP nodes won't all 
agree on what's legal.)  Conversely, if one believes the bit about the 
"latest version", then one can read the SOAP Recommenation as requiring 
support for the new characters as soon as http://www.w3.org/TR/REC-xml is 
updated to point to 5th edition.

For those reasons, I request that the XML Protocols WG:

1) Figure out what SOAP behavior is desired should it come to pass that 
XML 1.0 5th edition comes out as planned.  In particular, is it the case 
that conforming nodes MAY, MUST, SHOULD, SHOULD NOT, or MUST NOT accept 
the new characters in tag names in SOAP envelopes.  I believe it's clear 
that as long as 4th edition is current, the answer is MUST NOT.  Does that 
change if XML 1.0 5th edition reaches Recommendation?

2) Coordinate with the Core WG to ensure that publications are properly 
synchronized (or instead, if appropriate, provide feedback that XML 1.0 
5th edition is a problem for SOAP and should not be published, if that is 
what you believe.)

3) Consider a bit the impact bindings,  faults and errors, should you 
decide to allow for the new content.  Presumably, some nodes will be 
trying to send new content, perhaps to old nodes that aren't expecting it. 
 Maybe or maybe not the outbound end of the binding implementation 
notices.  Is that a binding-level error or something else?  Is there a 
standard SOAP fault to be defined to indicate that the wrong edition of 
XML has been used.  Maybe the outbound binding implementation is happy 
with the new chars, but the receiving node is old.  If an XML 1.0 
serialization is being used, then by far the most likely failure mode is 
just that the receiving binding (if it's checking well formedness and not 
trusting the sender), will reject the message as not well formed.  I'm not 
sure if there are more subtle issues with bindings that use non-XML 1.0 
forms on the wire.

4) In any case, I suggest you clarify the ambiguity as to whether the text 
at [2] and [3] is to be read as referring to the latest 
Recommendation-level edition of XML 1.0, or else as being to specifically 
4th edition.

Thank you.

Noah

P.S. In case some of those on the cc: list are not aware, I have not been 
a member of the Protocols WG for some time.  I am just commenting as an 
interested member of the W3C community.

[1] http://www.w3.org/TR/2008/PER-xml-20080205/
[2] http://www.w3.org/TR/soap12-part1/#soapenv
[3] http://www.w3.org/TR/soap12-part1/#XML

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Tuesday, 12 February 2008 19:16:31 UTC