Schema changes for absent, empty, and nil special values

Summary:

General changes needed to the proposed XML Schema recommendation:

The XML Schema specification must describe explicitly the XML data
model and state whether this supports three-valued logic (true, false,
and unknown) or two-valued logic.  It should remove some of the 
current limitations of the mechanism used to assign default values.  
It must clarify and extend the behavior of special values, e.g. absent,
empty, and xsi:nil.  
Absent c element: <t></t>
Empty  c element: <t><c/></t>
Nil    c element: <t><c xsi:nil="true"/></t>

Specific proposed additions to the functionality of XML Schema:

1. XML Schema should provide a way to assign xsi:nil as the default
   for an element.

2. XML Schema should provide a way to assign a value to an absent
   element.

3. XML Schema should provide a way to assign a value to an empty
   attribute.

Specific proposed changes to the functionality of XML Schema:

4. XML Schema should make the nillable attribute true by default,

Specific proposed changes to the syntax of XML Schema:

5a. If xsi:nil behaves essentially like the NULL value in SQL
    then it should be renamed back to null, and nillable should be
    renamed to nullable.
5a. Otherwise nillable should be renamed to nilable.

Details:

The following are my personal comments on the proposed XML Schema
recommendation.  I am a member of the W3C XML Query Working Group, and
these issues are discussed in great detail in my document titled
"XML Special Values".  However, because of time constraints and other
high priority items being addressed by the working group, no decisions
have been made on the proposals in this design document.  This document
is available, to W3C members only, in the archives of the Query
working group.  The most recent version is 1.1 attached to
http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2001Mar/0434.html
(as xmlspecial.doc, xmlspecial.pdf, and xmlspecial.html).

The general problem with the XML Schema specification is that it does
not adequately define the semantics of the three special values
possible in an XML instance document.  The next two paragraphs are
quoted from the beginning of my design document.  After that I discuss
how these issues affect XML Spchema, and propose way to resolve them.

<quote>
This document describes the behavior (semantics) of the special values
in XQuery.  This includes assigning them names, and arranging them
into a taxonomy.  There are many types of "special values" possible in
a query.  Some special values denotes errors, e.g. NaN.  Others refer
to a "value" that is absent, empty, or nil.  Here is how those special
values might look in an XML instance document.  In this example there
is an element named t that can contain an element named c, and the
special value applies to c.
Absent: <t></t>
Empty: 	<t><c/></t>
Nil: 	<t><c xsi:nil="true"/></t>

We cannot easily constrain the ways users will want to use XML.  Some
may want it to work like SQL, while other may prefer that it work like
Excel or some other system they are familiar with.  Therefore the goal
must be to allow users to fairly easily express queries that behave as
they intend.  The underlying XML data model and formal semantics
should be expressive enough to in principle capture all reasonably
likely semantics.  The end user language should be simple enough so
that in practice users can express their specific desired semantics.
</quote>

It is likely that many users will wish to use XML to model aspects of
the real world.  In the real world there are various states of affairs
that can be loosely described as "unknown", and others that are "not
applicable", or "known to be absent".  It is very important to 
give XML sufficient power to permit users to model these states.

The attribute xsi:nil (until recently named xsi:null) permits a user
to distinguish between the empty string, and a string whose value is
unknown.  This might be be useful in a variety of situations,
e.g. recording the middle initial of a person.
For a data type not derived from string the special value
xsi:nil="true" may be used an an explicit marker that the value is
unknown.  This may be useful in other situations, e.g. recording the
amount of a bonus granted to an employee.  It is not always possible to
use the absent "value" to represent unknown, because whenever a new
element is added to an XML schema all existing instance documents will
in essence have an absent element for that new element.  In many cases
the natural interpretation of absent is "known to be not applicable or
known to be not present".

Specific proposal:

1. XML Schema should provide a way to assign xsi:nil as the default
   for an element.

Currently the mechanism for default values does not permit assigning
attribute values to elements.  This is a limitation that should be
lifted.

2. XML Schema should provide a way to assign a value to an absent
   element.

Currently the default value mechanism only permits changes the value
for an element element and an absent attribute.
As noted above absent elements will be quite common, appearing
whenever a new element is added to a Schema.  It is very important to
be able to assign a default value to these.  The most frequently
assigned values will be the identity element for the "addition"
operation of the data type, e.g. 0 for numbers and the zero length
string for string, but any legal value should be permitted.
This proposal states that exactly one instance of the
element be "brought into existence" as a result of the user specifying
a default.  A slight variant of this proposal would be to require
maxOccurs="1" be in the Schema before a user could specify the
default for an absent element.

3. XML Schema should provide a way to assign a value to an empty
   attribute.

This is the "opposite" of the situation above.

Specific proposed changes to the functionality of XML Schema:

4. XML Schema should make the nillable attribute true by default.

Currently elements cannot be assigned the xsi:nill attribute unless
the nillable attribute is true.  Why change this behavior?  The answer
is that certain operations in XML query are likely to produce elements
that have xsi:nil="true".  For these documents to be schema valid a
user might need to edit the associated Schema, adding nillable="true".
This is an unnecessary burden that can be prevented.  

Specific proposed changes to the syntax of XML Schema:

5a. If xsi:nil behaves essentially like the NULL value in SQL, i.e.
    then it should be renamed back to null, and nillable should be
    renamed to nullable.

The fundamental behavior of NULL in SQL is that
* It is ignored in aggregate operations
* It is propgrated in "arithmetic" operations
* It produces the "unknown" truth value when used in any predicate
except IS [NOT] NULL.

If xsi:nil has esentially this same behavior it would be less
confusing for users to rename this back to null, and rename nillable
back to nullable.

5a. Otherwise nillable should be spelled nilable.  

People have been coining neologisms in English by adding suffixes to
existing roots for a long time.  Generally they do not violate the
rules of morphology.  This misspelling does.  There is no case that I
am aware of where adding the suffix "able" causese a single final
consonant to be doubled.  The most relevant example is that "gel" 
becomes "gelable".  This word should be spelled "nilable".

Hopefully helpfully yours,
Steve
-- 
Steven Tolkin          steve.tolkin@fmr.com      617-563-0516 
Fidelity Investments   82 Devonshire St. V10D    Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.

Received on Friday, 13 April 2001 14:25:32 UTC