Re: keyrefs from Jeni Tennison on 2001-11-09 (xmlschema-dev@w3.org from November 2001)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Fri, 9 Nov 2001 17:35:17 +0000
To: "Gary Robertson" <gazinyork@hotmail.com>
CC: xmlschema-dev@w3.org
Message-ID: <7822432285.20011109173517@jenitennison.com>
Hi Gary,

> The trouble, as I'm sure you know, is that real-world systems are
> arbitrarily complex and often do not readily fit solely into any
> particular pattern (such as hierarchies or relational databases).
> The system I am trying to model is a case in point: although the
> data is stored in a relational database many additional
> relationships and business rules are present/enforced in the C++
> code above the database layer. Also, from the users' point of view,
> parts of the data are hierarchical and parts are directed graphs (if
> that's the correct term for webs of symbols connected by arrows).
> The same data entities are often present in both and there are
> fairly complex (if you don't understand the application domain)
> rules about what can go where in both parts.

Yes. My feeling is that when you design a markup language, you should
pay more attention to the ease with which documents in that markup
language can be created and processed than to whether the constraints
that you want to express can be expressed in a particular schema
language. If a validatable structure is a big requirement for the
markup language, then great, try to make it fit in with the schema
language you've chosen, but if not, the main role of a schema is
documentation and often natural language is as good a definition
language as anything.

The reason Schematron is so useful is that while other languages are
limited in their grammar, XPath expressions are able to articulate
quite a lot of the rules that natural language can articulate about
XML structures. But there are some aspects of validation that just
aren't testable without a semantic understanding of the role of the
XML document.

For example, try expressing the constraint "the value of the <Orgn>
element must be one of the following acronyms if the organisation is
one of this list of organisations; otherwise it can be any string". Or
dynamic constraints like "the value of the data-type attribute must be
the qualified name of a data type supported by the XSLT processor
that you're using".

> I think that allowing full xpath syntax in refs/uniques/keyrefs (and
> adding user-definable error messages to them) would go a long way
> toward allowing me to do the things I need. Whether this would make
> XML schema a universal data description language (presumably the
> ultimate goal) or even satisfy all the requirements of this project,
> I don't know.

The identity constraints in XML Schema are clearly an 80/20 solution
to a hard problem. I don't think that adding full XPath support to
identity constraints will improve them *all* that much (although
definitely there are circumstances where it would help). Adding
Schematron-like rules to any complex type definition, on the other
hand... :)

But then in many ways I think it's right that there should be a gap
between validation of the structure of an XML document and validation
of the relationships between values held within an XML document.
Mixing the modular style of XML Schema with XPath is pretty nasty
(although perhaps XPath 2.0 will help, if it supports testing of
user-defined types).

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
Received on Friday, 9 November 2001 12:35:20 UTC