- From: Andrew Layman <andrewl@microsoft.com>
- Date: Thu, 13 Dec 2001 10:00:41 -0800
- To: <xml-dist-app@w3.org>
Regarding partially transmitted arrays, nulls, and related matters, you might be interested in three mails I wrote a few weeks ago to the SOAP Builders mailing list, as well as subsequent discussion by others. http://groups.yahoo.com/group/soapbuilders/message/6194?threaded=1 http://groups.yahoo.com/group/soapbuilders/message/6213 http://groups.yahoo.com/group/soapbuilders/message/6233 The full thread, including mails from others commenting on my analysis, begins at http://groups.yahoo.com/group/soapbuilders/message/6194?threaded=1 For the convenience of the XML Dist App archives, I copy the text of the three messages I wrote here: 6194 I support Alan Kent's suggestion that untransmitted members be equivalent to nulls. Both an untransmitted member and a null token represent omissions of information. Both indicate that no definite value is asserted. Neither has a meaning different from the other. I believe that any apparent distinction between nulls and omitted elements is due to confusion brought on by the following two factors: 1. Reification of null due to incorrect reasoning from programming grammars. Programming languages of the Algol ilk and similar define structure types in terms of the structure members, meaning creation of a small namespace representing a set of potential properties and their names associated with the structure type. These programming languages somewhat conflate the ideas of namespace, memory allocation and value. In allocating memory, a contiguous space is reserved, with space for each potential member value. Members that do not have a known value nonetheless have space reserved. This is filled with something. When the distinction between a known value and an unknown value is retained, as in Java Integer as contrasted with Java integer, the lack of a value is represented by a certain bit pattern, just as an actual value would be. This pattern is called "null". This leads to the idea that a null is a value, rather than the lack of one. But this idea is an error: a null is the way that these languages represent the lack of a stated value given that they must allocate some memory. Other representations do not require the allocation of memory for unactualized member values. For example, XML and Lisp allow omitted members. The token "null" also functions in many programming notations much the same way that the word "nothing" functions in English. That is, rather than saying "delete the value of the member x" we instead write that instruction as "x = null". However, the fact that null appears in a programming syntax in a similar position to where an actual value would appear reflects syntactic convenience only, it does not make null into a value. Null represents absence of information. 2. Nothing is not anything. There is no definite semantics to the absence of information. Nothing is, well, nothing. Any way of representing lack of information is going to have context-dependent interpretation. It may mean that the information is simply not known, it may indicate that a default value is appropriate, it may indicate that no new information is provided and any prior value, if any, should be used, etc. Consequently, the semantics of a null in one context will differ from the semantics of an absent member in a different context. This obscures the more general point that both null and absence of a member are different notation's ways of distinguishing the presence from the absence of a stated value. Sometimes, also, it is appealing to distinguish, within one context, complete ignorance from use of a default value from indication of the reason why no value is known from a statement that no value exists, etc. In those cases where exactly two aspects of ignorance are needed in a model, it is appealing to think of null as distinct from absence of a member. For example, we could then distinguish whether we are stating that we don't know the patient's age from the statement that we are not asking to have the patient's age changed. Without disputing that two distinguished tokens are useful, I observe that three, four and five are, also, but the preponderance of programming and database languages, as they are used in practice, at best only make one distinction between presence and absence of a stated value. That is, the Algol-style programming languages use null-notation and do not support omission of a member while XML supports omission-notation and has no null token. Alan suggests, if I understand him correctly, that we should treat null and a omission of an element as representing the same thing. I suggest that this is correct, and that the "same thing" represented is omission of a stated value, and that the significance of the omission of a stated value may be dependent on the meaning of the structure from which the information is omitted and the process employing that structure. One practical consequence of this is that nulls in a programming structure (absent special, contextual knowledge of special semantics) should appear in XML as omission of the corresponding element, and visa versa. Andrew Layman http://strongbrains.com -- Resources for Self-Education 6213 I'd like to add to my earlier mail [1] regarding the equivalence of nulls and omitted elements by making another distinction: representation of data versus update of data. This is important towards thinking about sparse arrays versus partially-transmitted arrays. But first, I'd like to look at the contrast between representation and update as it applies to structures whose members are distinguished by member name. Consider a simple data structure for medical patient data: Class Patient { String name; Boolean male; Int age; } An instance of this might have values of its members respectively { "Joe Green", True, 46 }. We might change some of those values by writing patient.name = "Jo Green"; patient.age = 42; In each of these cases, what is being described is fairly clear. First we have a structure with three member values (representation). Then we have instructions to change two of those values (update). But suppose I write the XML <patient> <name>Jo Green</name> <age>42</age> </patient> Is that equivalent to { "Jo Green", NULL, 42 } or to patient.name = "Jo Green"; patient.age = 42; ? If transmitted over a wire, does it result in a data structure having value { "Jo Green", NULL, 46 } or { "Jo Green", True, 42 }? That is, is the XML a *representation* of the current state of data or a command to *update* the state of the data? You cannot fully tell without knowing the context. Whereas the programming language used different syntax to distinguish representing a value from changing a value, XML only provides the former, representation of a value. That is, you cannot determine merely by looking at some XML outside of its context what the purpose of the XML is. Regarding ordinary structures, SOAP Section 5 only describes how to translate between a non-XML data structure and an XML data structure. It has no special rules for annotating or altering the data structure if its purpose is to be used in a command that updates data. In particular, SOAP section 5 never advances the concept of a "partially-represented" structured value. Consequently, we can say with confidence that the XML described above represents the same structured value as the programming language would with { "Jo Green", NULL, 42 }. Unfortunately, the specification is not quite so clear when it comes to arrays. Section 5.4.2.1 introduces the idea of a "partially transmitted array". This is not the representation of all of any array, but only of part of one, and this fact plus earlier discussions on this list [2] make it clear that one possible purpose of the partially-transmitted array is to carry out a data-update command. Although the phrasing of section 5.4.2.2 on sparse arrays is not so ambiguous, I believe that a good case can be made for either side of the argument. That is, the spec and the discussion of DCE cited above support reasonable arguments that sparse arrays are, or are not, for data update; are or are not for data representation. This leads to a difficulty in interpreting a structure such as <colors soap:arrayType="xsd:string[7]" > <item soap:pos="[1]">red</item> <item soap:pos="[7]">violet</item> </colors> This might be a sparse array, that is, a complete data representation of an array with members [ "red", NULL, NULL, NULL, NULL, NULL, "violet"]. But it might also be a partially-transmitted array, that is, a transmission for the purpose of update, of only items 1 and 7, and leaving all other elements untouched. Both of these interpretations are possible given the SOAP 1.1 specification. This is an unfortunate ambiguity that deserves to be cleaned up by the SOAP 1.2 effort at the W3C. I will give my opinion here on what I believe to be the best course. If an array has any elements NULL, that is, without recorded values, then it is necessary to somehow represent this in XML. Using absent elements to represent NULL is consistent with the interpretation given to absent elements in structures and suggested earlier in this thread. [1]. Under this interpretation, the XML structure <patient> <name>Jo Green</name> <age>42</age> </patient> definitely represents the same facts as the programming language structure { "Jo Green", NULL, 42 } and the XML structure <colors soap:arrayType="xsd:string[7]" > <item soap:pos="[1]">red</item> <item soap:pos="[7]">violet</item> </colors> represents exactly the same facts as does the programming language structure [ "red", NULL, NULL, NULL, NULL, NULL, "violet"]. I do not deny that there is utility to being able to represent commands to update preexisting data structures. However, this is equally true of both arrays and structures whose members are distinguished by member name. Rather than having a special form of array representation that might be interpreted as an update command, and having this remarkable facility available only for arrays, we would be better served to have (a) a uniform interpretation and representation for nulls/omitted-values that applies to both structures and arrays, and (b) decide that update commands and other semantic contexts for structures are beyond the immediate scope of SOAP section 5. In short, keep sparse arrays but make it very clear that omitted elements in XML correspond exactly to NULLs in programming languages, not to partial update commands. [1] http://groups.yahoo.com/group/soapbuilders/message/6194?threaded=1 [2] http://groups.yahoo.com/group/soapbuilders/message/5139?threaded=1 Andrew Layman http://strongbrains.com -- Resources for Self-Education 6233 Yet more on nulls and omitted elements. Consider again a simple XML structure of medical patient data: <patient> <name>Joe Green</name> <age>46</age> </patient> If stored into, or interpreted relative to, a structure layed out as follows, Class Patient { String name; Boolean male; Int age; } It would be interpreted as Patient name male age --------- --------- --------- Joe Green NULL 46 But, if interpreted relative to a structure like Class Patient { String name; Boolean male; Int age; String diagnosis; } Then it would be interpreted as Patient name male age diagnosis --------- --------- --------- ---------- Joe Green NULL 46 NULL And so on. There are two equivalent ways to look at this: 1. The XML structure only states values for certain members. When interpreted in the context of a programming or database structure that defines the potentiality for more members, the programming or database structure will only have values where the XML structure has values, and will have nulls (indication of no stated value) associated with all other members. 2. The XML structure only states values for certain members. If the XML structure is interpreted in the context of an XML schema with an <any> particle, then there are an unlimited number of other subelements that were potential but not actual. This is equivalent to a programming structure padded with an infinite number of members, of every possible name, all of which are filled with NULL markers, or a database row padded with an infinite number of columns, of every possible name, all of which are filled with NULL markers. For this reason, it is important, when mapping an XML structure to a programming structure, to map every omitted element to a NULL marker. Unless the programming structure is defined by a finite number of members, known to both the writer and the reader of the XML, it is not possible to rely on any explicit marking of an element to indicate that it is a null marker. That is, while xsi:nil or similar attribute can work when data structures are closed and agreed on by both reader and writer, they do not work in the broader circumstances. XML Schemas and many scripting languages allow potentially unlimited members to be part of any structure. Consider a schema like <type name="patient"> <element ref="name"/> <element ref="male"/> <element ref="age"/> <any namespace="##other"/> </type> It is not possible to write an XML instance that enumerates the infinite number of subelements without values. E.g. one cannot complete the ellipses in <patient> <name>Joe Green</name> <age>46</age> <a xsi:nil='1'/> <aa xsi:nil='1'/> <aaa xsi:nil='1'/> <aaaa xsi:nil='1'/> ... </patient> However, it is easily possible to omit the infinite number of subelements without values, to whit: <patient> <name>Joe Green</name> <age>46</age> </patient> Omission must map to NULL. To support simple round-tripping, NULL must map to omission. Andrew Layman http://strongbrains.com -- Resources for Self-Education
Received on Thursday, 13 December 2001 13:02:01 UTC