RE: [soapbuilders] FEEDBACK REQUESTED - Issues regarding Array encoding for SOAP 1.2 from Andrew Layman on 2001-12-13 (xml-dist-app@w3.org from December 2001)

From: Andrew Layman <andrewl@microsoft.com>
Date: Thu, 13 Dec 2001 10:00:41 -0800
To: <xml-dist-app@w3.org>
Message-ID: <C3729BBB6099B344834634EC67DE4AE102623C40@red-msg-01.redmond.corp.microsoft.com>
Regarding partially transmitted arrays, nulls, and related matters, you
might be interested in three mails I wrote a few weeks ago to the SOAP
Builders mailing list, as well as subsequent discussion by others.

http://groups.yahoo.com/group/soapbuilders/message/6194?threaded=1
http://groups.yahoo.com/group/soapbuilders/message/6213
http://groups.yahoo.com/group/soapbuilders/message/6233

The full thread, including mails from others commenting on my analysis,
begins at
http://groups.yahoo.com/group/soapbuilders/message/6194?threaded=1

For the convenience of the XML Dist App archives, I copy the text of the
three messages I wrote here:  

6194

I support Alan Kent's suggestion that untransmitted members be
equivalent to
nulls. Both an untransmitted member and a null token represent omissions
of
information. Both indicate that no definite value is asserted. Neither
has
a meaning different from the other.

I believe that any apparent distinction between nulls and omitted
elements
is due to confusion brought on by the following two factors:

1. Reification of null due to incorrect reasoning from programming
grammars.
Programming languages of the Algol ilk and similar define structure
types in terms of the structure members, meaning creation of a small
namespace representing a set of potential properties and their names
associated with the structure type. These programming languages somewhat
conflate the ideas of namespace, memory allocation and value. In
allocating
memory, a contiguous space is reserved, with space for each potential
member
value. Members that do not have a known value nonetheless have space
reserved. This is filled with something. When the distinction between a
known value and an unknown value is retained, as in Java Integer as
contrasted with Java integer, the lack of a value is represented by a
certain bit pattern, just as an actual value would be. This pattern is
called "null". This leads to the idea that a null is a value, rather
than
the lack of one. But this idea is an error: a null is the way that these
languages represent the lack of a stated value given that they must
allocate
some memory.
Other representations do not require the allocation of memory for
unactualized member values. For example, XML and Lisp allow omitted
members.
The token "null" also functions in many programming notations much the
same way that the word "nothing" functions in English. That is, rather
than
saying "delete the value of the member x" we instead write that
instruction
as "x = null".
However, the fact that null appears in a programming syntax in a similar
position to where an actual value would appear reflects syntactic
convenience only, it does not make null into a value. Null represents
absence of information.

2. Nothing is not anything.
There is no definite semantics to the absence of information. Nothing
is,
well, nothing. Any way of representing lack of information is going to
have
context-dependent interpretation. It may mean that the information is
simply not known, it may indicate that a default value is appropriate,
it
may indicate that no new information is provided and any prior value, if
any, should be used, etc. Consequently, the semantics of a null in one
context will differ from the semantics of an absent member in a
different
context. This obscures the more general point that both null and absence
of
a member are different notation's ways of distinguishing the presence
from
the absence of a stated value.

Sometimes, also, it is appealing to distinguish, within one context,
complete ignorance from use of a default value from indication of the
reason
why no value is known from a statement that no value exists, etc. In
those
cases where exactly two aspects of ignorance are needed in a model, it
is
appealing to think of null as distinct from absence of a member. For
example, we could then distinguish whether we are stating that we don't
know
the patient's age from the statement that we are not asking to have the
patient's age changed. Without disputing that two distinguished tokens
are
useful, I observe that three, four and five are, also, but the
preponderance
of programming and database languages, as they are used in practice, at
best
only make one distinction between presence and absence of a stated
value.
That is, the Algol-style programming languages use null-notation and do
not
support omission of a member while XML supports omission-notation and
has no
null token.

Alan suggests, if I understand him correctly, that we should treat null
and
a omission of an element as representing the same thing. I suggest that
this is correct, and that the "same thing" represented is omission of a
stated value, and that the significance of the omission of a stated
value
may be dependent on the meaning of the structure from which the
information
is omitted and the process employing that structure.

One practical consequence of this is that nulls in a programming
structure
(absent special, contextual knowledge of special semantics) should
appear in
XML as omission of the corresponding element, and visa versa.

Andrew Layman
http://strongbrains.com -- Resources for Self-Education

6213

I'd like to add to my earlier mail [1] regarding the equivalence of
nulls
and omitted elements by making another distinction:
representation of data versus update of data.

This is important towards thinking about sparse arrays versus
partially-transmitted arrays.

But first, I'd like to look at the contrast between representation and
update as it applies to structures whose members are distinguished by
member
name. Consider a simple data structure for medical patient data:

Class Patient {
String name;
Boolean male;
Int age;
}

An instance of this might have values of its members respectively

{ "Joe Green", True, 46 }.

We might change some of those values by writing

patient.name = "Jo Green";
patient.age = 42;

In each of these cases, what is being described is fairly clear. First
we
have a structure with three member values (representation). Then we have
instructions to change two of those values (update).

But suppose I write the XML

<patient>
<name>Jo Green</name>
<age>42</age>
</patient>

Is that equivalent to

{ "Jo Green", NULL, 42 }

or to

patient.name = "Jo Green";
patient.age = 42;

?

If transmitted over a wire, does it result in a data structure having
value
{ "Jo Green", NULL, 46 } or { "Jo Green", True, 42 }?

That is, is the XML a *representation* of the current state of data or a
command to *update* the state of the data?

You cannot fully tell without knowing the context. Whereas the
programming
language used different syntax to distinguish representing a value from
changing a value, XML only provides the former, representation of a
value.
That is, you cannot determine merely by looking at some XML outside of
its
context what the purpose of the XML is.

Regarding ordinary structures, SOAP Section 5 only describes how to
translate between a non-XML data structure and an XML data structure. It
has no special rules for annotating or altering the data structure if
its
purpose is to be used in a command that updates data.

In particular, SOAP section 5 never advances the concept of a
"partially-represented" structured value. Consequently, we can say with
confidence that the XML described above represents the same structured
value
as the programming language would with { "Jo Green", NULL, 42 }.

Unfortunately, the specification is not quite so clear when it comes to
arrays. Section 5.4.2.1 introduces the idea of a "partially transmitted
array". This is not the representation of all of any array, but only of
part of one, and this fact plus earlier discussions on this list [2]
make it
clear that one possible purpose of the partially-transmitted array is to
carry out a data-update command. Although the phrasing of section
5.4.2.2
on sparse arrays is not so ambiguous, I believe that a good case can be
made
for either side of the argument. That is, the spec and the discussion of
DCE cited above support reasonable arguments that sparse arrays are, or
are
not, for data update; are or are not for data representation.

This leads to a difficulty in interpreting a structure such as

<colors soap:arrayType="xsd:string[7]" >
<item soap:pos="[1]">red</item>
<item soap:pos="[7]">violet</item>
</colors>

This might be a sparse array, that is, a complete data representation of
an
array with members [ "red", NULL, NULL, NULL, NULL, NULL, "violet"].

But it might also be a partially-transmitted array, that is, a
transmission
for the purpose of update, of only items 1 and 7, and leaving all other
elements untouched.

Both of these interpretations are possible given the SOAP 1.1
specification.
This is an unfortunate ambiguity that deserves to be cleaned up by the
SOAP
1.2 effort at the W3C. I will give my opinion here on what I believe to
be
the best course.

If an array has any elements NULL, that is, without recorded values,
then it
is necessary to somehow represent this in XML. Using absent elements to
represent NULL is consistent with the interpretation given to absent
elements in structures and suggested earlier in this thread. [1]. Under
this interpretation, the XML structure

<patient>
<name>Jo Green</name>
<age>42</age>
</patient>

definitely represents the same facts as the programming language
structure

{ "Jo Green", NULL, 42 }

and the XML structure

<colors soap:arrayType="xsd:string[7]" >
<item soap:pos="[1]">red</item>
<item soap:pos="[7]">violet</item>
</colors>

represents exactly the same facts as does the programming language
structure

[ "red", NULL, NULL, NULL, NULL, NULL, "violet"].

I do not deny that there is utility to being able to represent commands
to
update preexisting data structures. However, this is equally true of
both
arrays and structures whose members are distinguished by member name.
Rather than having a special form of array representation that might be
interpreted as an update command, and having this remarkable facility
available only for arrays, we would be better served to have (a) a
uniform
interpretation and representation for nulls/omitted-values that applies
to
both structures and arrays, and (b) decide that update commands and
other
semantic contexts for structures are beyond the immediate scope of SOAP
section 5.

In short, keep sparse arrays but make it very clear that omitted
elements in
XML correspond exactly to NULLs in programming languages, not to partial
update commands.


[1] http://groups.yahoo.com/group/soapbuilders/message/6194?threaded=1
[2] http://groups.yahoo.com/group/soapbuilders/message/5139?threaded=1

Andrew Layman
http://strongbrains.com -- Resources for Self-Education

6233

Yet more on nulls and omitted elements.

Consider again a simple XML structure of medical patient data:

<patient>
<name>Joe Green</name>
<age>46</age>
</patient>

If stored into, or interpreted relative to, a structure layed out as
follows,

Class Patient {
String name;
Boolean male;
Int age;
}

It would be interpreted as

Patient
name male age
--------- --------- ---------
Joe Green NULL 46

But, if interpreted relative to a structure like

Class Patient {
String name;
Boolean male;
Int age;
String diagnosis;
}

Then it would be interpreted as

Patient
name male age diagnosis
--------- --------- --------- ----------
Joe Green NULL 46 NULL

And so on. There are two equivalent ways to look at this:

1. The XML structure only states values for certain members. When
interpreted in the context of a programming or database structure that
defines the potentiality for more members, the programming or database
structure will only have values where the XML structure has values, and
will
have nulls (indication of no stated value) associated with all other
members.

2. The XML structure only states values for certain members. If the XML
structure is interpreted in the context of an XML schema with an <any>
particle, then there are an unlimited number of other subelements that
were
potential but not actual. This is equivalent to a programming structure
padded with an infinite number of members, of every possible name, all
of
which are filled with NULL markers, or a database row padded with an
infinite number of columns, of every possible name, all of which are
filled
with NULL markers.

For this reason, it is important, when mapping an XML structure to a
programming structure, to map every omitted element to a NULL marker.
Unless the programming structure is defined by a finite number of
members,
known to both the writer and the reader of the XML, it is not possible
to
rely on any explicit marking of an element to indicate that it is a null
marker. That is, while xsi:nil or similar attribute can work when data
structures are closed and agreed on by both reader and writer, they do
not
work in the broader circumstances. XML Schemas and many scripting
languages
allow potentially unlimited members to be part of any structure.
Consider a
schema like

<type name="patient">
<element ref="name"/>
<element ref="male"/>
<element ref="age"/>
<any namespace="##other"/>
</type>

It is not possible to write an XML instance that enumerates the infinite
number of subelements without values. E.g. one cannot complete the
ellipses
in

<patient>
<name>Joe Green</name>
<age>46</age>
<a xsi:nil='1'/>
<aa xsi:nil='1'/>
<aaa xsi:nil='1'/>
<aaaa xsi:nil='1'/>
...
</patient>

However, it is easily possible to omit the infinite number of
subelements
without values, to whit:

<patient>
<name>Joe Green</name>
<age>46</age>
</patient>

Omission must map to NULL. To support simple round-tripping, NULL must
map
to omission.


Andrew Layman
http://strongbrains.com -- Resources for Self-Education
Received on Thursday, 13 December 2001 13:02:01 UTC