Re: Constraints in XML Schema - Formal Language Background?

Schematron indeed has many nice properties and can do this sort of thing. 
On the other hand, I'm tempted to say that what you really have here is 
not an XML data model at all, but a graph model that happens to be 
serialized in XML.  At some point, it becomes more appropriate to have a 
moderate amount of checking at the XML schema level (e.g. that each arc 
has a FromId and a ToId) and then to build a schema language to constrain 
your graphs.  After all, it's nearly hopeless to look for generalized 
graph structures such as doubly linked cycles, unless you just view 
something like XSL as a Turing complete programming language and program 
the checks.  To do it declaratively, you'd need a graph constraint 
language.

XML level schemas can't generally fully check abstractions at the next 
level up.  We can recognize integers, but not accurately validate prime 
numbers (you can declare a named subtype of Integer and call it Prime, but 
you can't express tight validation constraints...the Unique Particle 
Attribution constraint does ensure that you'll know which elements and 
attributes were asserted to be Prime, but you'll have to write the prime 
number check yourself.)  Similarly, we can validate that an attribute 
value resembles a credit card number, but we can't check whether the card 
was stolen (and thus invalid.)  I think your example is in the grey area 
at the border of what we should and should not try to do.  Thanks.

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







Jan Mendling <mendling@web.de>
01/06/2003 08:24 PM

 
        To:     noah_mendelsohn@us.ibm.com, xmlschema-dev@w3.org
        cc: 
        Subject:        Re: Constraints in XML Schema - Formal Language Background?


Hi Noah and the others,
I do not think that W3C XML Schema needs something like tree grammar too 
much, although a relaxation of the Unique Particle Attribution Rule 
forbidding nondeterministic content models would be a plus.
Currently I have a problem, which I do not know how to express with any 
sort of tree grammar. Consider the following:
...
<Arc FromId="1" ToId="2"/>
<Arc FromId="2" ToId="1"/>
<Arc FromId="1" ToId="2"/>
...

I want to detect whether (1) there are other Arc elements with the @FromId 
(Arc1) being equal to their @ToId (ArcX) and their @FromId (ArcX) being 
equal to the @ToId (Arc1). 
This can be expressed with Schematron's XPath Assertions. You could argue 
that I could model my content structure in a different way, so that 
grammars might capture these properties. But this is often 
counterproductive in terms of readability. Therefore, I think a flexible 
and user-friendly solution would be to have something like Schematron 
assertions in W3C XML Schema. And as XPath as a W3C standard is involved, 
I cannot imagine that there will be too much overhead in calculation. 
Or am I wrong? It would be nice to have some ideas here from a formal 
language point of view!
Greets, Jan



noah_mendelsohn@us.ibm.com schrieb am 07.01.03 00:24:27:
> >> you are absolutely right that the expressiveness of XML
> >> schema constraints should be improved
> 
> I agree.
> 
> >>  and XPath seems to be a natural option.
> 
> Yes, though certainly other options (Relax-like tree
> automata, something else grammar-based, etc.) should at
> least be considered before a decision is made.  I agree
> that XPath is a likely good choice.
> 
> > About performance: I think performance matters should
> > not guide the decision about wheter XPath-Constraints
> > should be added to the schema specification or not. If
> > performance is a matter then people can switch of
> > validation (or use only simple constraints).
> 
> Here I respectfully but strongly disagree.  It's
> essentially that my customers and those with whom they
> do business get consistent results when they validate a
> given document with a given schema.  If they say "Well,
> it was valid with XYZ-Corp.'s high performanc processor
> but not ABC's" we've got a mess.  The main reason to
> use XML is universal consistency and interop.  High
> performance schema processing is very, very important
> to IBM's customers, as is consistency of semantics.  I
> think we can get better co-occurrence constraints
> without sacrificing performance.
> 
> ------------------------------------------------------------------
> Noah Mendelsohn                              Voice: 1-617-693-4036
> IBM Corporation                                Fax: 1-617-693-8676
> One Rogers Street
> Cambridge, MA 02142
> ------------------------------------------------------------------
> 
> 

-- 

~~~~~~~~~~~~~
~   Jan Mendling
~   Güterstr.53
~   54295 Trier
~   0175-1636958
~~~~~~~~~~~~~
______________________________________________________________________________
Die vCard -  Ihr neues Kennzeichen  -  bei WEB.DE FreeMail!
http://freemail.web.de/features/?mc=021156

Received on Wednesday, 8 January 2003 01:13:06 UTC