Lists or sets of values

Section 1.3 acknowledges that supersetting XML 1.0 DTD's requires defining
certain aggregate datatypes such as IDREFS, ENTITIES and NMTOKENS, but does
not provide for a general mechanism for defining additional aggregate
datatypes, deferring that to a later version of the standard.  I think it
would be much preferable to establish at least some minimal support for
lists of other datatypes to preclude the widespread use of the "string"
datatype as the basis for user-defined lists.

If I needed a "uris" datatype in a list-free XML schema, I would probably do
something like:

<datatype name="uris">
	<basetype name="string"/>
</datatype>

I think it would be reasonable to introduce a built-in type "list" that
strictly indicated that the content was a space deliminated list of
non-space containing items.  Facets that constrain number of items, other
deliminators, etc could be added in later revisions.

In addition, it would be useful to also add as built-in datatypes list
versions of non-space containing built-in datatypes.  From my quick survey
that would seem to be

booleans
reals
timeInstants
timeDurations
recurringInstants
uris
languages
decimals 
integers
dates
times

would be reasonable.  The non-negative-integers, etc, seem to be overkill,
but could be included for completeness.  If these were new types plus the
generic list datatype were added, then most schema developers would be
content to stay within those boundaries until we figure out how we
ultimately want to support lists.  I would not think this would add
substantial complexity to the implementation of parsers since the essential
elements for list support are needed for the XML 1.0 datatypes of ENTITIES,
IDREFS and NMTOKENS,

My take at a minimal support for extensible lists would be something like

<datatype name="mytypes">
	<basetype name="list"/>
	<listitem minOccur="2" maxOccur="7">
		<datatypeRef name="mytype"/>
	</listitem>
</datatype>

Where listitem is a facet that is only appropriate if the basetype is list.


The desciption of ENTITIES and IDREFS describe them as null-separated where
section 1.3 says space separated.  The XML 1.0 spec's definition of the
Nmtokens production would not allow my interpretation of null-separated
(separated by #x00 characters) and the datatypes document does not appear to
define a different interpretation.

Received on Thursday, 4 November 1999 14:36:10 UTC