- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 15 Mar 2004 16:09:12 -0500
- To: xml-dist-app@w3.org
- Cc: muraw3c@attglobal.net, sanjiva@us.ibm.com, ryman@ca.ibm.com
- Message-ID: <OF3ED72A2E.06989C11-ON85256E58.006ADA8D@lotus.com>
Background for distApp readers: the XMLP workgroup has formed a small task force to study issues and options relating to the emergence of XML 1.1 and its implications for SOAP [1]. The task force is open to all members of the XMLP workgroup, but so far Herve Ruellan, Yves Lafon and I are the only "volunteers". We had a phone chat this morning, at which we agreed to discuss our work on the distApp list using messages with the subject prefix [XML11TF]. You've already seen one message from Yves on the particular issue of restricting the infoset and its relation to the HTTP binding. Here, I'm going to try and take a broader view and just lay out what I see to be some of the questions before us and the options available. If we can agree on this analysis or one like it, then we can set about making the choices: Restricting Envelope Infoset content to >some< version(s) of XML ---------------------------------------------------------------- SOAP 1.2 specifies that a SOAP envelope is an XML Infoset, and it carefully constrains some aspects of the infoset. For example, the root element must be <soap:envelope>. In other places, it inherits no restrictions other than those implicit in the infoset rec itself. For example, it does not in general restrict the character children of elements within the <soap:body>. To my great surprise, Richard Tobin points out that the Infoset Rec does not in fact restrict such characters to be even the new ones allowed in XML 1.1. On the contrary, it allows code points such as "0" which are not allowed by any version of XML. I strongly believe that it was our intention that SOAP infosets be serializable using at least some version of XML. I believe our recommendation is contradictory on this important point and as proposed at [2] I think we should open an issue on this and close that issue with an erratum to SOAP 1.2. This erratum should at least rule out characters such as "0"; whether it should restrict the Infoset specifically to content serializable in SOAP 1.2 is discussed in the next section below. Should SOAP Envelope Infosets allow XML 1.0, XML 1.1 or a choice of content? ---------------------------------------------------------------------------- I think my approach to analyzing this issue is a bit different than Yves'. I specifically think we have to choose between two options that are quite starkly different: I. Make clear that for the foreseeable future, legal SOAP Envelope Infosets must be serializable using XML 1.0 (I.e., no control characters, no new name characters). This proposed statement has nothing directly to do with a particular binding or wire format: it is a statement about what may in principle be in the envelope. Having made this rule we get to keep one of SOAP's original guiding principles from the binding framework: any binding must be capable of transmitting any envelope Infoset. From [3]: "Therefore, the minimum responsibility of a binding in transmitting a message is to specify the means by which the SOAP message infoset is transferred to and reconstituted by the binding at the receiving SOAP node and to specify the manner in which the transmission of the envelope is effected using the facilities of the underlying protocol." vs. II. I think the likeliest alternative is to say: "SOAP Envelope Infosets must be directly serializable using some recommendation-level version of XML. All bindings MUST be capable of transmitting Infosets which have content representable using XML 1.0 Second edition. Bindings MAY be written to additionally transmit Infoset information allowed by subsequent versions of XML (e.g. to transmit the name and control characters added in XML Version 1.1). Bindings MUST signal a binding-dependent error in any situation in which the Infoset cannot be transmitted and reconstructed with full fidelity. NOTE: A consequence of these rules is that envelopes that use only XML 1.0-compatible content can be transmitted through any SOAP network, regardless of choices of binding or introduction of intermediaries; envelopes that use features of newer versions of XML may not be transmissible using certain bindings, or may fail to transit certain intermediaries." So, one way we get universal interop, but are restricted to 1.0-style content only. The other way we allow optional use of XML 1.1, but with the risk that sticking a non-XML 1.0 intermediary in the path may prevent transmission of otherwise legal infosets. I think the XMLP wg should choose between these, and I think it's a tough choice. See also issues below relating to bindings and description languages. What about bindings in general and the HTTP binding in particular? ------------------------------------------------------------------ I think the above sets out the options for bindings in general. The choices for our HTTP binding will depend on which of the paths above we take. Note that our HTTP binding actually defers content decisions to the application/soap+xml media type spec, which in turn defers to RFC 3023 and application/xml. In private communication, Murata Makoto has made clear that his intention was always that 3023 and application/xml be usable with any version of XML. He and I have at least informally discussed the possibility that 3023 would be clarified as follows: "application/xml is to be used with any W3C Recommendation-level version of XML as identified in the version specification of the XML declaration. When no such declaration is present, XML 1.0 is assumed. In all examples herein where a specific version such as version="1.0" is shown, it is understood that other versions may also be used, providing the content does indeed conform to the specified version of the XML Recommendation. Specifications and recommendations based on or referring to this RFC SHOULD indicate any limitations on the particular versions of XML to be used. For example, a particular specification might indicate: "content MUST be represented using media-type application/xml, and the document must either (a) carry an xml declaration specifying version="1.0" or (b) omit the xml declaration, in which case per the XML recommendation the version defaults to 1.0" I have some reason to believe that this text is being proposed at IETF, but haven't heard anything on it lately. If things go this way, then we will have a choice in our SOAP HTTP binding: * Issue an erratum clarifying that XML version 1.0 MUST be the serialized form used with application/xml -or- * Issue an erratum clarifying that all implementations MUST be capable of reading at least XML version 1.0, but that implementations MUST use a choice of XML 1.0 or XML 1.1 when transmitting (and maybe allow for future versions too). In this case I think we should also say: "Implementations SHOULD, where practical, use the earliest version of XML suitable for the content. For example, if an envelope uses none of the new name or control characters introduced with XML version 1.1, it should if possible be serialized using XML version 1.0. NOTE: it is recognized, however, that performance or other considerations may preclude such careful choice of XML versions. Particularly in streaming scenarios, it may be impractical to determine sufficiently early whether new forms of content are being used." Note that, with respect to the 2nd option, the usual means of HTTP content negotiation seem not to apply, since both the XML 1.0 and XML 1.1 forms would be send using the same media type. Line ends ========= XML 1.1 allows new line end characters. I think we agreed on the call that this is visbible only in the serializations, not the infoset, and is thus a purely hop-by-hop concern for individual bindings. Presumably, whatever we decide regarding XML versions for our HTTP binding will settle the line end question for that binding. Other bindings are, of course, free to use any XML or non-XML serializations, and to use line ends as need by the binding. Data Model and Encoding ======================= The SOAP data model says that "An edge label is an XML qualified name"[4], which we can now see to be ambiguous because no reference is made to a particular version of XML namespaces or to a particular rigorous definition of "qualified name". It seems we need to decide whether there are any circumstances in which the new name characters of XML 1.1 are allowed in such edge names. The encoding section states [5]: "For a graph edge which is distinguished by label, the [local name] and [namespace name] properties of the child element information item together determine the value of the edge label." This suggests, not surprisingly, that our decision on data model edge names should be made consistent with our decision on Infoset local names for element information items. The names of node types are also an issue. From [6]: "All graph nodes have an optional type name of type xs:QName in the namespace named "http://www.w3.org/2001/XMLSchema" (see XML Schema [XML Schema Part 2])." The definition of xs:Qname [7] refers to the 1999 version of Namespaces in XML [8]. So, data model type names are definitely limited to the old form of QName. It seems we should not change this unless and until XML Schema decides on how to deal with an xs:Qname11 or some such (and what a mess that will be!) WSDL and Description Languages ============================== WSDL is not and IMO should not be a requirement for SOAP. Nonetheless, being able to use SOAP with languages such as WSDL seems to be important. WSDL bases its 'literal' specifications on XML schema, and for a variety of reasons current versions of XML Schema do not validate XML 1.1 content. Some of the reasons were summarized in my tech plenary lightening talk. I expect the slides will eventually be posted at [9], but in the meantime a .zip file with various formats is attached. I think the slides are self-explanatory. In summary, you can't declare elements or attributes with the new names, xsd:strings don't take the new control characters, xs:QName is the old style QNames. the xsd:name type is the old style, etc. In short, as we make the main decisions above about enabling or optionally enabling XML 1.1 content in SOAP, we may want to consider coordination issues with WSDL and XML Schema (and indirectly with all the other groups such as Query and XSL that will depend on schema and typing.) XOP/MTOM/Primer/TestCases.Schemas ================================= All our other SOAP specs and schemas need a thorough check to ensure they are unambiguous and match whatever we decide about XML 1.1. Errata vs. new releases ======================= My personal opinion is that SOAP 1.2 as it stands is self-contradictory or at best unclear. It thus needs at least a clarification as an erratum. We may or may not wish to do a two stage approach, in which (for example) SOAP 1.2 is clarified as being XML 1.0 2nd Ed. only, and some SOAP 1.2.1 or some such enables optional XML 1.1. In that case, we'll have to decide how the new version of SOAP is signalled on the wire. Summary ======= That's roughly what I remember of the issues as we discussed them on the call this morning. For me, the big one is the first one: what do we do about the Infosets? If we stick to 1.0 we have interop, but we make life difficult for all the users who needed XML 1.1 for their content. If we enable XML 1.1, then we run the risk that an intermediary or binding can't handle it, and interop breaks. We also have to coordinate with RFC 3023 work on getting the media type straight. The WSDL coordination worries me in all of this. The rest of it (e.g. encoding) looks manageable. Noah [1] http://lists.w3.org/Archives/Public/xml-dist-app/2004Feb/0006.html [2] http://lists.w3.org/Archives/Public/xmlp-comments/2004Mar/0012.html [3] http://www.w3.org/TR/soap12-part1/#bindfw [4] http://www.w3.org/TR/soap12-part2/#graphedges [5] http://www.w3.org/TR/soap12-part2/#complexenc [6] http://www.w3.org/TR/soap12-part2/#graphnodes [7] http://www.w3.org/TR/xmlschema-2/#QName [8] http://www.w3.org/TR/1999/REC-xml-names-19990114/ [9] http://www.w3.org/2004/03/TechPlenAgenda.html (See attached file: Making the XML Stack Work With XML 1.1.zip) -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Attachments
- application/zip attachment: Making_the_XML_Stack_Work_With_XML_1.1.zip
Received on Monday, 15 March 2004 16:10:34 UTC