Arrays in XML Schema - Last Call Issue LC-84 - Schema WG response

                                        October 4, 2000

Dear Mr. Miller, Ms. Hunter,

        First an apology.  I have been remiss in responding to 
both of your comments on the XML Schema Draft concerning arrays.  
This does not reflect any lack concern of the Schema WG concerning 
the issue, which was discussed at some length at several meetings 
and teleconferences.  

        Thus, I am belatedly writing on behalf of the XML Schema WG 
concerning your last call comments concerning the issue of array 
specifications in XML Schema.  This issue is known to the XML Schema 
WG as LC-84: Arrays?  Related issues include LC-124 (part 1), 
LC-144 (see following messages) and LC-102: Microparsing

        Please see the following discussion of the issue and
the Schema WG response. At the close of this email you will find
instructions on how to respond, indicating whether the Schema WG
response is satisfactory to you.

        Note that in my response I have taken some liberties to
explicate the Schema WG decisions more fully than has been formally 
recorded in the Schema WG minutes to assist the MPEG-7 group 
in understanding their rationale.  I believe that my remarks 
accurately reflect the views of the majority of the Schema WG.

LC-84. arrays: Arrays?
----------------------

Issue Class: A Locus: both Cluster: 15 arrays Status: unassigned
Assigned to: Frank Olken Originator: Robert Miller, MPEG-7

Description
------------


Should XML Schema be modified to provide support for arrays?

Interactions and Input
----------------------

Input from Robert Miller:

"Miller, Robert (GXS)" <Robert.Miller@gxs.ge.com> to XML Schema Comments 
list, Tue, 2 May 2000 15:43:11 -0400 

Just today, I received an Email from someone who had seen 
my earlier Email to the Schema Work Group pointing out the need 
to support arrays of information. Spreadsheets, a simple array
construct, 
are not provided a common representation in the XML Schema work.
...
If a service approach were to be considered (cf. issue XML Schema 
considered inadequately extensible), some thought should be given 
to other services that might be desired (such as an array processing 
service), such that service syntactic support needs are adequately 
addressed in the underlying Schema syntax, even if the considered 
services are not fully defined and implemented.

Input from :

1. Datatypes Issue MPEG-7 requires both arrays and matrices. 
We would prefer to have built-in array (1D) and matrix (2D, 3D)
datatypes, 
instead of simply the 'derivedBy = list' mechanism.
If these cannot be provided then the alternative is to use lists. 
In the current WD, you can only create lists from atomic data types 
and since a list is not an atomic data type then you cannot 
create matrices using 'lists of lists' e.g.:

<simpleType name="ArrayOfInteger" base="integer" derivedBy="list"/>
   <length value="2"/>
</simpleType>

<simpleType name="MatrixOfInteger" base="ArrayOfInteger"
derivedBy="list"/>
   <length value="4"/>
</simpleType>


Alternatively we can simply convert matrices to flattened lists which 
can be 1D, 2D or 3D and use a dim facet to lists to specify 
multi-dimensionality:

<simpleType name="MatrixOfInteger" base="ArrayOfInteger"/>
        <dim value="2 4"/>
</simpleType>

XML Schema WG Response
-----------------------

XML Schema Language V 1.0 will provide no array or vector data types.
Several versions of the array proposal were considered and rejected
(for quite different reasons).  Specifically, we separately considered
extending "lists" (a simple datatype), and "array type constructors" 
(a complex datatype).

Concerning "lists of lists" or "lists of non-atomic datatypes"
the Schema WG firmly decided that it did not want to go in that
direction
at all.  "Lists" were included in XML Schema as a minimal 
generalization of legacy constructions for NMTOKENS and IDREFS, etc.
The general view of the WG was that simple datatypes (suitable for
describing attributes) should ideally be restricted to atomic values.
More complex constructions (lists of lists, lists of tuples or vectors)
should be constructed as "complex datatypes", i.e., using nested
element markup constructions in XML.  This topic was discussed
in the Schema WG under the guise of Last Call Issue 102 (LC-102)
Microparsing Support, which was discussed (and rejected) at the
Edinburgh Face-to-Face meeting.  This aspect of the issue has
been repeatedly discussed in various guises and it appears that
the Schema WG is quite firm on this decision.

The Schema WG position on array type constructors as complex 
datatypes was more moderate.  The Schema WG was not convinced
that such a constructor should be added to Version 1.0 of the
XML Schema.  The rationale was that the WG was not convinced
that the additional complexity was necessary, since conventional XML
markup facilities could specify the dimensions, and the 
array content could be a sequence of <arrayElement>'s (possibly
containing nested <array>'s. This decision should be seen in 
light of a variety of numerous comments which have been made to the
Schema
WG that the XML Schema Language  is already too baroque. 

Some of the Schema WG members argued unsuccessfully that such an 
approach failed to adequately convey the array semantics in a 
standardized fashion.  Also, standardized array syntax would 
facilitate query language operators specific to arrays, e.g., 
operators to extract rows, columns or other subarrays.  
However, the XML query language WG has not expressed such concerns.
It is conceivable that the Schema WG might be persuaded
to revisit this aspect of the issue in later versions of 
XML Schema (see discussion below concerning XML Protocol Work Group).


Minor points:

        As noted the WG generally frowns on compound simple datatypes,
hence:

        <dimensions> 
                <dim> 2 </dim> 
                <dim> 4 </dim> 
        </dimensions>

would be preferred to the syntax you suggested in your comment:

        <dim> 2 4 </dim>

        Similarly, proposals to flatten arrays would be discouraged
because they implicitly specify markup (structure).
Thus detailed mark up syntax:

        <array>
                <arrayElement> 1.0 </arrayElement>
                <arrayElement> 2.0 </arrayElement>
                <arrayElement> 3.0 </arrayElement>
                <arrayElement> 4.0 </arrayElement>
        </array>

would be preferred to the flattened syntax you suggested 
in your comment:

        <array> 1.0 2.0 3.0 4.0 </array>

The flattened syntax is similar to the array syntax of XSIL.
Observe that the detailed mark up syntax is easier to extend
to nested arrays.  It is also easier to process in XSLT.

[Note that these points represent F. Olken's 
interpretation of the sentiment of the Schema WG. ]


To summarize the position of the Schema WG is:

        1) arrays as simple datatypes - not now, not ever.
        2) arrays as complex datatype constructors - not now

Subsequent to the decisions of the Schema WG, a new XML Protocol
WG (URL: http://www.w3.org/2000/xp/) has been chartered by the W3C.  
It will meet later this month.  David Fallside (IBM) (email:
fallside@us.ibm.com) is the chair of the new WG. He has stated
that this WG will likely take up the issue of specifying arrays, because
this is needed by RPC protocols (e.g., SOAP) which permit the 
transmission of arrays.  See the W3C note on SOAP
(URL: http://www.w3.org/TR/SOAP/ ) Section 5.4.2. on Arrays.
Arrays would thus initially emerge in the 
XML Protocols Requirements Document.  Hopefully, the Protocol WG array
efforts will be coordinated with the Schema WG, e.g, perhaps as
part of Schema Version 1.1.  


Is this response adequate ?
------------------------------

The XML Schema Working Group wants to know your opinion
of our response to your last call comments.  This information
will be included with the package submitted to the W3C
Executive Director as part of the recommendation to take
the XML Schema Language to Candidate Recommendation.
We would appreciate your response as soon as possible.

Please choose from one of the following responses, adding 
whatever details, explanation you wish:

1)  "GOOD ENOUGH"  - You are satisfied with the Schema WG response
to your comments on XML Schema Language.  The response meets 
your requirements.  The matter may be considered resolved.

2) "STOP THE PRESSES"  - You are not happy with the response
to your comments on XML Schema Language.  Either the response
is unclear or inadequate.  The issue is of sufficient importance
and urgency that you want it called to the attention of the 
W3C Executive Director and you ask that the XML Schema Language 
delayed in advancing to Candidate Recommendation until the 
issue is resolved. 

3)  "LATER - VERSION 1.1"  - You are not happy with the response,
but are prepared to defer reconsideration until XML Schema Lang.
Version 1.1 is drafted.  It is anticipated (hoped) that Version 1.1
will be completed by mid-2001.  Version 1.1 is intended primarily
to fix small issues needed by other W3C Working Groups to proceed 
with their work (especially XML Query Language).  You request that
your comments be reconsidered when drafting the Version 1.1 
requirements document.

4) "LATER - VERSION 2.0"  - You are not happy with the response,
but are prepared to defer consideration until XML Schema Language
Version 2.0 is drafted.  It is anticipated that Version 2.0 would
not be completed until late 2001 or early 2002.  Version 2.0 may
include major revisions, e.g., multiple inheritance, etc.
You request that your comments be reconsidered when drafting the 
Version 2.0 requirements document.

5) "NO LONGER CARE"  - You are not happy with the response, but
no longer care to pursue the matter, because ....


                  Belatedly,

                  Frank Olken
                  XML Schema Language Working Group

  Lawrence Berkeley National Laboratory   (510) 486-5891 (voice)
  Mailstop 50B-3238                       (510) 486-4004 (fax)
  1 Cyclotron Road                        (510) 843-5145 (home)
  Berkeley, CA 94720, USA                 (510) 442-7361 (pager)

  E-mail:  olken@lbl.gov
  WWW:     http://www.lbl.gov/~olken/

Received on Wednesday, 4 October 2000 17:04:08 UTC