XForms data model: comments

XForms: the relationship with XSchema
--------------------------------------
I've been reading the XSchema Primer at http://www.w3.org/TR/xmlschema-0/

In my view, having a library of built-in real-world datatypes that extend
XSchema's "Simple types" (the example given in the XForms Data Model is
a "money" datatype) is a Good Thing.

Deliberately removing many of the less common XSchema types is not,
and nor is removing some of the facets, unless there are easy-to-use
workarounds.

Dropped types
-------------
Many of the XSchema simple types dropped are storage instructions for
handling arbitrary numbers (XSchema has a float, double, long, int,
short, byte, unsignedLong, unsignedInt, unsignedShort, unsignedByte).
As well as implementation hints, in terms of forms, what these give us
is built in validation (patterns/masks in Xforms, and minInclusives and
maxInclusives).  There are also some human-intended numeric built-ins
(integer, decimal, positiveInteger, nonPositiveInteger, negativeInteger,
nonNegativeInteger).

In my view, XForms should have integers, decimals and floats, with a
(wideranging) "storage" facet for those that want or need it.

A second group of types dropped relate to dates (timeInstant,
recurringDuration, month, year, century, recurringDate, recurringDay).
I think the first is crucial for timestamps (i.e. calculated values:
people can type stuff in as a date and a time).  In my view, XForms
should have date, time, day, month, year, century, day-of-week and
timestamp types.  I have no view on date periods or recurring durations,
excepting that someone must have wanted them badly enough to have got
them into XSchema.

A third group of simple types dropped, relate to XML datatypes (ID,
IDREF, ENTITY, NOTATION, IDREFS, ENTITIES, NMTOKEN, NMTOKENS, Name,
QName, NCName).  I think the main value of the naming types, would be
in forms-as-XML-editors, and in a simple query language - i.e. that a
string entered is a name that already exists.  So I think there should
be an XMLname datatype, with a scheme facet.  

The value of keeping types ID and IDREF(S) rest on whether database
design should be described in the XForm, and if so, how.  The XSchema
Primer authors are scathing about the limitations of using ID and IDREF
as a method of indicating primary and foreign keys (section 5.3), and
propose a (IMHO) pretty wild alternative.  But I think being able to
generate receiving table structures from a (simple) XForm form is a
worthwhile goal.  To which end, I would add ID, and IDREF facets (not
datatypes), the former as a boolean, the latter as a string pointing to
the table.field (element) it is meant to refer to.  Defining multiple
IDs in the same table would generate a composite key.

There are several other implications of allowing database design to be
embedded in a form: 
* we'd need a tablename facet (<group> is used both to group elements, 
  and to show hierarchy through the min/maxOccurs facets);
* we'd need a "hidden" facet (boolean) for derived variables 
  (defaults to false);
* we'd need a "varlabel" facet - for those short names databases need
  (defaults to name);
* we'd need a "label" facet - which could also be used for error messages 
  (defaults to name);

Nulls and missing values 
------------------------ 
XSchema's "nullable" facet seems worthwhile: to say whether we care that
an item has been missed or set to "" by the form user.  It might also
be worthwhile to be able to have an explicit missing value, if NULL
(however expressed) is going to be hard to process - as it is in some
statistics packages, for example.

Multiple selection lists
------------------------
Xschema's multiple selection lists are not explicitly supported in XForms,
but are achieved through adding minOccurs and maxOccurs facets to elements
(as part of a bigger scheme).  I would propose adding a "process" facet
to hint as whether the results should be processed as a list (probably
being stored in a single string) or item-by-item (as a nested table) -
defaults depending on min- and maxOccurs.

Making your own datatypes
-------------------------
XForms will not allow the derivation of new, reuseable datatypes (though
section 6.6 hints at a model).  So if "money" wasn't a datatype, you
wouldn't be able to say "add a currency attribute to a decimal number" -
and instead would do it as a decimal + an enumeration (list) of currency
types, defaulting to your locale.

I think you *should* be able to reuse previous groups and elements,
simply by naming the group or element.  XSchema uses equivalence classes
for this (Primer, section 4.5), and has the concept of an "abstract"
element (which only exists for reuse).

The <purchaseOrder> example could then go like this:

    <group name="address" abstract="true">
        <string name="name">
        <string name="street"> 
            etc
    </group>

    <group name="purchaseOrder">
        <group name="shipTo" equivClass="address" />
        <group name="billTo" equivClass="address" />
    </group

I would suggest XForms ducks the issue of namespaces at this point,
by making these abstract definitions global in scope.

Dropped facets
--------------
I don't see the point of this, if there is no implementation hit.
XSchema facets  - there are only 14 of them - are based on the entered
string (even for a number), and the ordered value (even for a string)
with a few extras for units and precision.  2 examples that would have
to be faked in XForms as it is currently proposed: the use of pattern
with a decimal number makes sense if you're recording readings from a
digital machine; the minimum and maximum values of a time duration make
sense if recording time-to-event.


Other thoughts on XForms Data Model
-----------------------------------

Content of enumerations
-----------------------
The goal of separating data model from user interface is a good one.
However, not all localisation can be done at the UI level, as the text of
a form includes the string values in an enumeration.  While lending extra
support to the wish expressed above that "day-of-week" and "month" should
be built-ins, some method of picking up suitable values is required.
I think the <variant> method could be a little clumsy, and it might be
better done with a "macro" language - I'm not sure if this is where XML's
ENTITY and NOTATION fit in - and by the ability to reference local as
well as remote libraries of datatypes.  (I can also see this working
for forms created on the fly from databases, if the macro can be a
(reference to a) script).

A second issue with enumerations, is that I don't see the option/value
pair we have in <select> lists in HTML.  For example, I'd like to see
weekday defined as:

    <union name="weekday">
        <string range="closed">
            <option value="1">Monday</option>
                etc
        </string>
        <integer min="1" max="7">
    </union>

i.e. I don't want strings returned, when the strings represent codes ...


Calculated values
-----------------
I think the "calc" facet is redundant: what is required is a default
value, and (if a field is not meant to be edited) a fixed value - both
as expressions suitable to the type, for all types.  A few more built-in
mathematical (as opposed to financial) functions would be nice too!


Yours
Neil Walker
--------------------------------------------------------------------
Neil Walker                     tel:   +44 (0) 1223 330379
MRC Biostatistics Unit          fax:   +44 (0) 1223 330388
Cambridge, UK                   email: neil.walker@mrc-bsu.cam.ac.uk
                                web:   http://www.mrc-bsu.cam.ac.uk 
--------------------------------------------------------------------

Received on Thursday, 15 June 2000 09:54:01 UTC