[XML11TF] Summary of Issues

Background for distApp readers:  the XMLP workgroup has formed a small task
force to study issues and options relating to the emergence of XML 1.1 and
its implications for SOAP [1].  The task force is open to all members of
the XMLP workgroup, but so far Herve Ruellan, Yves Lafon and I are the only
"volunteers".  We had a phone chat this morning, at which we agreed to
discuss our work on the distApp list using messages with the subject prefix
[XML11TF].  You've already seen one message from Yves on the particular
issue of restricting the infoset and its relation to the HTTP binding.

Here, I'm going to try and take a broader view and just lay out what I see
to be some of the questions before us and the options available. If we can
agree on this analysis or one like it, then we can set about making the
choices:

Restricting Envelope Infoset content to >some< version(s) of XML
----------------------------------------------------------------

SOAP 1.2 specifies that a SOAP envelope is an XML Infoset, and it carefully
constrains some aspects of the infoset.  For example, the root element must
be <soap:envelope>.  In other places, it inherits no restrictions other
than those implicit in the infoset rec itself.  For example, it does not in
general restrict the character children of elements within the <soap:body>.

To my great surprise, Richard Tobin points out that the Infoset Rec does
not in fact restrict such characters to be even the new ones allowed in XML
1.1.  On the contrary, it allows code points such as "0" which are not
allowed by any version of XML.

I strongly believe that it was our intention that SOAP infosets be
serializable using at least some version of XML.   I believe our
recommendation is contradictory on this important point and as proposed at
[2] I think we should open an issue on this and close that issue with an
erratum to SOAP 1.2.  This erratum should at least rule out characters such
as "0";  whether it should restrict the Infoset specifically to content
serializable in SOAP 1.2 is discussed in the next section below.

Should SOAP Envelope Infosets allow XML 1.0, XML 1.1 or a choice of
content?
----------------------------------------------------------------------------

I think my approach to analyzing this issue is a bit different than Yves'.
I specifically think we have to choose between two options that are quite
starkly different:

I.  Make clear that for the foreseeable future, legal SOAP Envelope
Infosets must be serializable using XML 1.0 (I.e., no control characters,
no new name characters).  This proposed statement has nothing directly to
do with a particular binding or wire format:  it is a statement about what
may in principle be in the envelope.  Having made this rule we get to keep
one of SOAP's original guiding principles from the binding framework:  any
binding must be capable of transmitting any envelope Infoset.  From [3]:
"Therefore, the minimum responsibility of a binding in transmitting a
message is to specify the means by which the SOAP message infoset is
transferred to and reconstituted by the binding at the receiving SOAP node
and to specify the manner in which the transmission of the envelope is
effected using the facilities of the underlying protocol."

vs.

II.  I think the likeliest alternative is to say:  "SOAP Envelope Infosets
must be directly serializable using some recommendation-level version of
XML.  All bindings MUST be capable of transmitting Infosets which have
content representable using XML 1.0 Second edition.  Bindings MAY be
written to additionally transmit Infoset information allowed by subsequent
versions of XML (e.g. to transmit the name and control characters added in
XML Version 1.1).  Bindings MUST signal a binding-dependent error in any
situation in which the Infoset cannot be transmitted and reconstructed with
full fidelity.  NOTE:  A consequence of these rules is that envelopes that
use only XML 1.0-compatible content can be transmitted through any SOAP
network, regardless of choices of binding or introduction of
intermediaries;  envelopes that use features of newer versions of XML may
not be transmissible using certain bindings, or may fail to transit certain
intermediaries."

So, one way we get universal interop, but are restricted to 1.0-style
content only.  The other way we allow optional use of XML 1.1, but with the
risk that sticking a non-XML 1.0 intermediary in the path may prevent
transmission of otherwise legal infosets.

I think the XMLP wg should choose between these, and I think it's a tough
choice.  See also issues below relating to bindings and description
languages.

What about bindings in general and the HTTP binding in particular?
------------------------------------------------------------------

I think the above sets out the options for bindings in general.  The
choices for our HTTP binding will depend on which of the paths above we
take.

Note that our HTTP binding actually defers content decisions to the
application/soap+xml media type spec, which in turn defers to RFC 3023 and
application/xml.  In private communication, Murata Makoto has made clear
that his intention was always that 3023 and application/xml be usable with
any version of XML.  He and I have at least informally discussed the
possibility that 3023 would be clarified as follows:

"application/xml is to be used with any W3C Recommendation-level version of
XML as identified in the version specification of the XML declaration.
When no such declaration is present, XML 1.0 is assumed.  In all examples
herein where a specific version such as version="1.0" is shown, it is
understood that other versions may also be used, providing the content does
indeed conform to the specified version of the XML Recommendation.

Specifications and recommendations based on or referring to this RFC SHOULD
indicate any limitations on the particular versions of XML to be used.  For
example, a particular specification might indicate:  "content MUST be
represented using media-type application/xml, and the document must either
(a) carry an xml declaration specifying version="1.0" or (b) omit the xml
declaration, in which case per the XML recommendation the version defaults
to 1.0"

I have some reason to believe that this text is being proposed at IETF, but
haven't heard anything on it lately.

If things go this way, then we will have a choice in our SOAP HTTP binding:

* Issue an erratum clarifying that XML version 1.0 MUST be the serialized
form used with application/xml

-or-

* Issue an erratum clarifying that all implementations MUST be capable of
reading at least XML version 1.0, but that implementations MUST use a
choice of XML 1.0 or XML 1.1 when transmitting (and maybe allow for future
versions too).  In this case I think we should also say:  "Implementations
SHOULD, where practical, use the earliest version of XML suitable for the
content.  For example, if an envelope uses none of the new name or control
characters introduced with XML version 1.1, it should if possible be
serialized using XML version 1.0.  NOTE:  it is recognized, however, that
performance or other considerations may preclude such careful choice of XML
versions.  Particularly in streaming scenarios, it may be impractical to
determine sufficiently early whether new forms of content are being used."

Note that, with respect to the 2nd option, the usual means of HTTP content
negotiation seem not to apply, since both the XML 1.0 and XML 1.1 forms
would be send using the same media type.

Line ends
=========

XML 1.1 allows new line end characters.  I think we agreed on the call that
this is visbible only in the serializations, not the infoset, and is thus a
purely hop-by-hop concern for individual bindings.  Presumably, whatever we
decide regarding XML versions for our HTTP binding will settle the line end
question for that binding.  Other bindings are, of course, free to use any
XML or non-XML serializations, and to use line ends as need by the binding.

Data Model and Encoding
=======================

The SOAP data model says that "An edge label is an XML qualified name"[4],
which we can now see to be ambiguous because no reference is made to a
particular version of XML namespaces or to a particular rigorous definition
of "qualified name".  It seems we need to decide whether there are any
circumstances in which the new name characters of XML 1.1 are allowed in
such edge names.

The encoding section states [5]:

"For a graph edge which is distinguished by label, the [local name] and
[namespace name] properties of the child element information item together
determine the value of the edge label."

This suggests, not surprisingly, that our decision on data model edge names
should be made consistent with our decision on Infoset local names for
element information items.

The names of node types are also an issue.  From [6]:

"All graph nodes have an optional type name of type xs:QName in the
namespace named "http://www.w3.org/2001/XMLSchema" (see XML Schema [XML
Schema Part 2])."  The definition of xs:Qname [7] refers to the 1999
version of Namespaces in XML [8].  So, data model type names are definitely
limited to the old form of QName.  It seems we should not change this
unless and until XML Schema decides on how to deal with an xs:Qname11 or
some such (and what a mess that will be!)

WSDL and Description Languages
==============================

WSDL is not and IMO should not be a requirement for SOAP.  Nonetheless,
being able to use SOAP with languages such as WSDL seems to be important.
WSDL bases its 'literal' specifications on XML schema, and for a variety of
reasons current versions of XML Schema do not validate XML 1.1 content.
Some of the reasons were summarized in my tech plenary lightening talk.  I
expect the slides will eventually be posted at [9], but in the meantime a
.zip file with various formats is attached.  I think the slides are
self-explanatory.   In summary, you can't declare elements or attributes
with the new names, xsd:strings don't take the new control characters,
xs:QName is the old style QNames. the xsd:name type is the old style, etc.

In short, as we make the main decisions above about enabling or optionally
enabling XML 1.1 content in SOAP, we may want to consider coordination
issues with WSDL and XML Schema (and indirectly with all the other groups
such as Query and XSL that will depend on schema and typing.)

XOP/MTOM/Primer/TestCases.Schemas
=================================

All our other SOAP specs and schemas need a thorough check to ensure they
are unambiguous and match whatever we decide about XML 1.1.


Errata vs. new releases
=======================

My personal opinion is that SOAP 1.2 as it stands is self-contradictory or
at best unclear.  It thus needs at least a clarification as an erratum.  We
may or may not wish to do a two stage approach, in which (for example) SOAP
1.2 is clarified as being XML 1.0 2nd Ed. only, and some SOAP 1.2.1 or some
such enables optional XML 1.1.  In that case, we'll have to decide how the
new version of SOAP is signalled on the wire.

Summary
=======

That's roughly what I remember of the issues as we discussed them on the
call this morning.  For me, the big one is the first one:  what do we do
about the Infosets?  If we stick to 1.0 we have interop, but we make life
difficult for all the users who needed XML 1.1 for their content. If we
enable XML 1.1, then we run the risk that an intermediary or binding can't
handle it, and interop breaks.  We also have to coordinate with RFC 3023
work on getting the media type straight.  The WSDL coordination worries me
in all of this.  The rest of it (e.g. encoding) looks manageable.

Noah

[1] http://lists.w3.org/Archives/Public/xml-dist-app/2004Feb/0006.html
[2] http://lists.w3.org/Archives/Public/xmlp-comments/2004Mar/0012.html
[3] http://www.w3.org/TR/soap12-part1/#bindfw
[4] http://www.w3.org/TR/soap12-part2/#graphedges
[5] http://www.w3.org/TR/soap12-part2/#complexenc
[6] http://www.w3.org/TR/soap12-part2/#graphnodes
[7] http://www.w3.org/TR/xmlschema-2/#QName
[8] http://www.w3.org/TR/1999/REC-xml-names-19990114/
[9] http://www.w3.org/2004/03/TechPlenAgenda.html

(See attached file: Making the XML Stack Work With XML 1.1.zip)


--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Monday, 15 March 2004 16:10:34 UTC