RE: XForms Basic and Schema Validation from Mark Birbeck on 2006-05-08 (www-forms@w3.org from May 2006)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Mon, 8 May 2006 13:33:46 +0100
To: <www-forms@w3.org>
Message-ID: <035801c6729b$a69ea530$7e01a8c0@Jan>
Hi John,

Here's an 'executive summary' of the points that I'll provide explanation
for, inline below:

  One problem with XForms Basic as defined is that it doesn't
  explain how the 'downgrading' of a complex type should take
  place. The second bullet (in XForms Basic) provides for the
  *possibility* of this downgrading by saying that a processor
  "may" choose to only support simple types, but nowhere is it
  explained what it would mean in practice. (And in reply to
  your and Raman's view that the third bullet deals with this,
  I'm afraid it doesn't--it deals with *datatypes*, which are
  simple types.)


>	The sentence says that all Schema datatypes other than the
> ones listed are to be treated as string, not all built-in schema
> datatypes.

Well...firstly it actually says "XML Schema datatypes" which to me means
'the datatypes from XML Schema Part 2'. In other words, it doesn't deal with
other types defined by an author.

But even if you ignore the "XML Schema" bit, the term used is 'datatype'
which has a very specific meaning; to infer that this sentence suggests that
any *complex* type that the author has defined should also be converted to
xs:string, would require you to include 'complex types' within the
definition of 'datatypes' which--as you rightly say in a discussion with
Allan on that very subject-- is incorrect. :)

I suppose you could say that using the word 'datatype' was a mistake, and
what was actually intended was the more general term 'type definition'; but
that makes things worse, since this term includes both simple and complex
types, so the third bullet would actually be saying that any type definition
other than those listed would be xs:strings--obviously not what is intended.

So my suggestion is for the WG to stop trying to rush this out, and properly
resolve the issue of how complex types behave. (The spec hasn't moved for
about 2 1/2 years, I think another week isn't going to hurt.)

As it happens, I don't really see anything wrong with the third bullet in
relation to its stated subject matter which is datatypes. All it says is
that for some datatypes you don't need to provide any special regular
expressions if you are doing a 'subset processor'.


However, the big thing that *is* glaringly missing is the bridge from the
goal that has been described (of not requiring an XForms Basic processor to
have a full XML Schema implementation) and the reality of the prose; we need
something very clear that explains how a Basic processor should proceed if
it is going to 'downgrade' complex types.

In working through some kind of proposal for this, it seems to me that
mapping to xs:string may not actually be the best solution. I'll try to
explain, and people can say what they think.


Looking at the entirety of XML Schema, I would say that what we're after is
the following behaviour for a 'subset' schema processor:

  * a reference to any undefined type is an error;

  * any *datatype* that is not in the list in
    bullet 3 has a regular expression that is
    equivalent to xs:string;

  * any *simple* type is processed as normal (i.e.,
    as it would be in Full);

  * any *complex* type is processed as if it were
    a simple type, with all element and attribute
    definitions ignored.

The first point may or may not be implicit in our schema processing anyway,
but I think it needs some clarification. However, we can ignore it for this
discussion since it should really be defined in XForms Full anyway.

The second point, on the behaviour of datatypes, is already given by bullet
3 in the spec, so we need do nothing here either.

Similarly, on the third point, the behaviour of simple types is already
given by bullet 2 in the spec, and although it might benefit from
clarification, it's at least there in part.

So all we need is an extra bullet that clarifies how complex types are
converted, and here I'm proposing *not* that they are automatically
converted to strings--which is the current proposal--but that the
*structural* features are ignored.


The following example is given in the XML Schema specification of how an
element 'width' can have a value which is a non-negative integer, as well as
an attribute which indicates the unit of that non-negative integer:

  <xs:complexType name="length1">
    <xs:simpleContent>
      <xs:extension base="xs:nonNegativeInteger">
        <xs:attribute name="unit" type="xs:NMTOKEN"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>

  <xs:element name="width" type="length1" />

  <width unit="cm">25</width>

As far as I can see a processor that can handle simple types (which all
Basic processors will do) can process the example I just gave, as easily as
they can process the following:

  <xs:simpleType name="length1">
    <xs:restriction base="xs:nonNegativeInteger" />
  </xs:simpleType>

By doing this, at very little cost we reduce the gap between XForms Basic
and XForms Full. (From the XML Schema terminology point of view, what I'm
saying is that since:

  simple content = simple type + attributes

we can 'remove' the attributes and still make use of the simple type
definition, rather than just saying 'string'.)


>	"In my opinion" This is why any attempt to assign a datatype
> other than the ones listed should be regarded as string.

I agree if you were using the term datatype in its proper sense. But
datatypes are simple types, not complex ones, so I disagree, since it sounds
like you are using it to cover complex types.


>	At a higher level, the purpose of basic was exactly so that
> basic processors did not have to do a very smart schema engine.
>	This goal does not seem to be achieved if basic processors
> have to be smart enough to read the schema definitions to figure
>	out that the datatype is undefined.

That's a slightly different issue. The processor has to process the XML
Schema mark-up anyway, in order to find the simple types. Spotting undefined
types should be easy.


>	"In my opinion" An implementation should be able to write
> lexical analyzers for just those 26 given datatypes, and apply
> the write analyzer for the given datatype and otherwise it
> should be able to pretend that the type assignment refers to
> string.

I understand the goal, and explained that in my post...but I still don't
like the fact that you can't predict what the platform you are running on
will do. I really don't think it's a good idea to allow Basic to 'maybe' do
this or 'maybe' do that. I would prefer to see the behaviour defined clearly
and then for us to say that this is how a Basic processor will behave.
However, I'm happy to leave that issue to one side whilst we actually sort
out the lack of clarity on the behaviour.

Regards,

Mark


Mark Birbeck
CEO
x-port.net Ltd.

e: Mark.Birbeck@x-port.net
t: +44 (0) 20 7689 9232
b: http://internet-apps.blogspot.com/
w: http://www.formsPlayer.com/

Download our XForms processor from
http://www.formsPlayer.com/
Received on Monday, 8 May 2006 12:35:03 UTC