Proposed resolution to issues 460 & 461 from noah_mendelsohn@us.ibm.com on 2004-02-23 (xml-dist-app@w3.org from February 2004)

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 23 Feb 2004 16:31:08 -0500
To: xml-dist-app@w3.org
Message-ID: <OF12F2F829.D4393CC5-ON85256E43.00762CE8@lotus.com>
On our last telcon I was assigned the issue:

"Genereate [sic] text for issue 461 & 462 (properties) by next
teleconference."

First of all, I believe the correct issues are 460 [1] and 461 [2].  This
note is in fulfillment of that action.

Status quo
-------------
These issues relate to the manner in which data model nodes are identified
as candidates for XOP optimization.  Our latest XOP working draft says in
the introduction [3]:

" Optimization in XOP is limited to the content of those elements which
contain characters that can be interpreted as the canonical lexical
representation of the XML Schema base64Binary datatype (see [XML Schema
Part 2] 3.2.16 base64Binary and Errata in XML Schema, E2-54). Attributes,
non-base64-compatible character data, and data not in the canonical
representation of the base64Binary datatype cannot be successfully
optimized by XOP. Optimization in XOP is limited to the content of those
elements which contain characters that can be interpreted as the canonical
lexical representation of the XML Schema base64Binary datatype (see [XML
Schema Part 2] 3.2.16 base64Binary and Errata in XML Schema, E2-54).
Attributes, non-base64-compatible character data, and data not in the
canonical representation of the base64Binary datatype cannot be
successfully optimized by XOP."

and in the processing model [4]:

" Identify within the Original Data Model the Element Nodes to be
optimized. Such Nodes MUST have type equal to xs:base64Binary , and the
return value of the dm:string-value accessor of such Nodes must be in the
canonical lexical representation of that type as described in Errata in XML
Schema, E2-54."

First of all, issue 460 raises the question of whether a XOP implementation
has discretion to skip optimization of a node of type base64Binary.  I
believe that a careful reading of the text quoted above suggests that
optimization is indeed optional, but in any case the purpose of this note
is to offer a different formulation of the processing model.

Issue 461 asks about the use of types derived from base64Binary.  XOP is in
general oblivious to type hierarchies and has no means of independently
identifying such derivations.  Accordingly, the current design only
supports elements labeled explicitly with a dm:type of xsd:base64Binary.

I think that implicit in the current design is the observation that a data
model is an abstraction anyway.  We do not require you to build any
particular data structures that your program may not already have, as long
as you can provide information we expect in the XOP data model.  In
particular, nothing prevents you from asserting to a XOP implementation
that a node that you consider to be a datatype derived from base64Binary is
in fact be interpreted for XOP purposes as being of the base type.  While I
think your current approach is for the most part coherent, I think it is
confusing and unduly subtle in this respect.

Proposed approach
-----------------

In essence, the present design is overloading the dm:type accessor for two
related purposes: (1) to signify the type of an element in the usual manner
and (2) as an "optimization candidate" switch for XOP.  I am coming to
believe that this dual use is confusing and unnecessary, and indeed I
believe that this confusion underlies issues such as 460 and 461.

I therefore propose that we go back to a design similar to the one
described at [5], in which optimization candidates were specified
separately from the data.  That proposal dates from a time when MTOM was
SOAP-specific.  Its use of SOAP properties in particular seems not to be
appropriate for XOP, but I believe the concept is about right.

I specifically propose that we modify the specification to require as a
prerequisite for XOP processing:

1) An instance of a data model.  Unlike the present design, this can be a
relatively traditional data model in which, for example, even nodes with
non-canonical lexical forms can be labeled with dm:type of
xsd:base64Binary.
2) A list identifying element nodes in that model as candidates for
optimization.

We will say that input provided to XOP processing MUST obey the following
constraints:

a) Each element node identified in the list MUST have a dm:type of
xsd:base64Binary or must be of a type known by the implementation to be a
refinement of xsd:base64Binary.
b) The dm:string-value of each such node must be a canonical lexical
representation of the xsd:base64Binary type as described in [XML Schema
Part 2 Errata]

Input which does not meet these criteria is not suitable for XOP
processing.

The rule quoted from [4] above will be rewritten to say:

"Choose from the nodes identified in the input list zero or more nodes to
be optimized.  The policy used for making such choice is at the discretion
of the implementation.  For example, some implementations may decline to
attempt optimization of nodes for which the dm:sting-value consists of just
a few characters, as in this case the optimized form may be larger than the
original.

NOTE:  XOP does not in general convey type information in the encoded form
of a document.  Reconstituted data models (see section 4.2) will not in
general reconstruct dm:type values supplied in the input."

This design allows us to do what many of our reviewers seem to want to do:
start with a data model that has a full range of type attributions,
including possibly non-canonical lexical forms labeled as xsd:base64Binary,
as well as nose labeled with types known to be derived from
xsd:base64Binary.

If necessary, we can clarify that the form of the list of optimization
candidates is implementation dependent.  In particular, it can either be
some explicit list or collection data structure that identifies candidate
nodes, or it can be implicit.  For example, it is legal but not required
for implementation to have a policy of attempting to optimize every element
node of type xsd:base64Binary.

Noah


[1] http://www.w3.org/2000/xp/Group/xmlp-issues#x460
[2] http://www.w3.org/2000/xp/Group/xmlp-issues#x461
[3] http://www.w3.org/TR/2004/WD-xop10-20040209/#introduction
[4] http://www.w3.org/TR/2004/WD-xop10-20040209/#xop_processing_model
[5] http://www.w3.org/TR/2003/WD-soap12-mtom-20030721/#aof-properties

--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Monday, 23 February 2004 16:33:00 UTC