RE: XForms Basic and Schema Validation

Hi Mark,

The notion of datatype is orthogonal to simple vs. complex type.

Section 2.2.1.3 of Schema Part 1 is clear in defining the fact that you 
can have a complex type with simple content. 
And you can datatype validate the simple content of a complex type.

>I agree if you were using the term datatype in its proper sense. But
>datatypes are simple types, not complex ones, so I disagree, since it 
sounds
>like you are using it to cover complex types.

I am using datatype in its proper sense, which is also what the spec is 
doing, I believe.
Datatypes are not simple types.  They are descriptions of string 
validations, which can
be used to validate content of both simple and complex types.

>Well...firstly it actually says "XML Schema datatypes" which to me means
>'the datatypes from XML Schema Part 2'. In other words, it doesn't deal 
with
>other types defined by an author.

Sorry, but you are misreading "XML Schema datatypes" as "XML Schema 
built-in datatypes".
If XForms Basic had intended to refer to the built-in datatypes, it should 
have used
that term.  But XML Schema Part 2 is about providing the machinery for 
defining ones own 
datatypes. It then uses that machinery to create a number of built-in 
datatypes.  Note
that the built-in datatypes can be used in complex types that define 
simple content.

So, we are left with the fact that XForms 1.0 was designed to address the 
*main* use 
case for validation, which is user input validation.  That's why the spec 
contains
language associating the type MIP with schema datatype. If anything more 
than that works
for an implementation, it seems to me to be a bit of a bonus for that 
implementation.

John M. Boyer, Ph.D.
Senior Product Architect/Research Scientist
Co-Chair, W3C XForms Working Group
Workplace, Portal and Collaboration Software
IBM Victoria Software Lab
E-Mail: boyerj@ca.ibm.com  http://www.ibm.com/software/

Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer





"Mark Birbeck" <mark.birbeck@x-port.net> 
Sent by: www-forms-request@w3.org
05/08/2006 05:33 AM

To
<www-forms@w3.org>
cc

Subject
RE: XForms Basic and Schema Validation







Hi John,

Here's an 'executive summary' of the points that I'll provide explanation
for, inline below:

  One problem with XForms Basic as defined is that it doesn't
  explain how the 'downgrading' of a complex type should take
  place. The second bullet (in XForms Basic) provides for the
  *possibility* of this downgrading by saying that a processor
  "may" choose to only support simple types, but nowhere is it
  explained what it would mean in practice. (And in reply to
  your and Raman's view that the third bullet deals with this,
  I'm afraid it doesn't--it deals with *datatypes*, which are
  simple types.)


>                The sentence says that all Schema datatypes other than 
the
> ones listed are to be treated as string, not all built-in schema
> datatypes.

Well...firstly it actually says "XML Schema datatypes" which to me means
'the datatypes from XML Schema Part 2'. In other words, it doesn't deal 
with
other types defined by an author.

But even if you ignore the "XML Schema" bit, the term used is 'datatype'
which has a very specific meaning; to infer that this sentence suggests 
that
any *complex* type that the author has defined should also be converted to
xs:string, would require you to include 'complex types' within the
definition of 'datatypes' which--as you rightly say in a discussion with
Allan on that very subject-- is incorrect. :)

I suppose you could say that using the word 'datatype' was a mistake, and
what was actually intended was the more general term 'type definition'; 
but
that makes things worse, since this term includes both simple and complex
types, so the third bullet would actually be saying that any type 
definition
other than those listed would be xs:strings--obviously not what is 
intended.

So my suggestion is for the WG to stop trying to rush this out, and 
properly
resolve the issue of how complex types behave. (The spec hasn't moved for
about 2 1/2 years, I think another week isn't going to hurt.)

As it happens, I don't really see anything wrong with the third bullet in
relation to its stated subject matter which is datatypes. All it says is
that for some datatypes you don't need to provide any special regular
expressions if you are doing a 'subset processor'.


However, the big thing that *is* glaringly missing is the bridge from the
goal that has been described (of not requiring an XForms Basic processor 
to
have a full XML Schema implementation) and the reality of the prose; we 
need
something very clear that explains how a Basic processor should proceed if
it is going to 'downgrade' complex types.

In working through some kind of proposal for this, it seems to me that
mapping to xs:string may not actually be the best solution. I'll try to
explain, and people can say what they think.


Looking at the entirety of XML Schema, I would say that what we're after 
is
the following behaviour for a 'subset' schema processor:

  * a reference to any undefined type is an error;

  * any *datatype* that is not in the list in
    bullet 3 has a regular expression that is
    equivalent to xs:string;

  * any *simple* type is processed as normal (i.e.,
    as it would be in Full);

  * any *complex* type is processed as if it were
    a simple type, with all element and attribute
    definitions ignored.

The first point may or may not be implicit in our schema processing 
anyway,
but I think it needs some clarification. However, we can ignore it for 
this
discussion since it should really be defined in XForms Full anyway.

The second point, on the behaviour of datatypes, is already given by 
bullet
3 in the spec, so we need do nothing here either.

Similarly, on the third point, the behaviour of simple types is already
given by bullet 2 in the spec, and although it might benefit from
clarification, it's at least there in part.

So all we need is an extra bullet that clarifies how complex types are
converted, and here I'm proposing *not* that they are automatically
converted to strings--which is the current proposal--but that the
*structural* features are ignored.


The following example is given in the XML Schema specification of how an
element 'width' can have a value which is a non-negative integer, as well 
as
an attribute which indicates the unit of that non-negative integer:

  <xs:complexType name="length1">
    <xs:simpleContent>
      <xs:extension base="xs:nonNegativeInteger">
        <xs:attribute name="unit" type="xs:NMTOKEN"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>

  <xs:element name="width" type="length1" />

  <width unit="cm">25</width>

As far as I can see a processor that can handle simple types (which all
Basic processors will do) can process the example I just gave, as easily 
as
they can process the following:

  <xs:simpleType name="length1">
    <xs:restriction base="xs:nonNegativeInteger" />
  </xs:simpleType>

By doing this, at very little cost we reduce the gap between XForms Basic
and XForms Full. (From the XML Schema terminology point of view, what I'm
saying is that since:

  simple content = simple type + attributes

we can 'remove' the attributes and still make use of the simple type
definition, rather than just saying 'string'.)


>                "In my opinion" This is why any attempt to assign a 
datatype
> other than the ones listed should be regarded as string.

I agree if you were using the term datatype in its proper sense. But
datatypes are simple types, not complex ones, so I disagree, since it 
sounds
like you are using it to cover complex types.


>                At a higher level, the purpose of basic was exactly so 
that
> basic processors did not have to do a very smart schema engine.
>                This goal does not seem to be achieved if basic 
processors
> have to be smart enough to read the schema definitions to figure
>                out that the datatype is undefined.

That's a slightly different issue. The processor has to process the XML
Schema mark-up anyway, in order to find the simple types. Spotting 
undefined
types should be easy.


>                "In my opinion" An implementation should be able to write
> lexical analyzers for just those 26 given datatypes, and apply
> the write analyzer for the given datatype and otherwise it
> should be able to pretend that the type assignment refers to
> string.

I understand the goal, and explained that in my post...but I still don't
like the fact that you can't predict what the platform you are running on
will do. I really don't think it's a good idea to allow Basic to 'maybe' 
do
this or 'maybe' do that. I would prefer to see the behaviour defined 
clearly
and then for us to say that this is how a Basic processor will behave.
However, I'm happy to leave that issue to one side whilst we actually sort
out the lack of clarity on the behaviour.

Regards,

Mark


Mark Birbeck
CEO
x-port.net Ltd.

e: Mark.Birbeck@x-port.net
t: +44 (0) 20 7689 9232
b: http://internet-apps.blogspot.com/
w: http://www.formsPlayer.com/

Download our XForms processor from
http://www.formsPlayer.com/

Received on Monday, 8 May 2006 21:32:19 UTC