Re: I used XML Schema in a recent project, and encountered a few things that, from C. M. Sperberg-McQueen on 2005-08-16 (www-xml-schema-comments@w3.org from July to September 2005)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: 16 Aug 2005 14:53:34 -0600
To: Jim Showalter <jim@jimandlisa.com>
Cc: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
Message-Id: <1124225613.4298.151.camel@localhost>
On Tue, 2005-07-26 at 07:46, Jim Showalter wrote:

> [I used XML Schema in a recent project and encountered a few things
> that,] if improved, would make it a lot more powerful.

> My application has a complicated configuration file that has to be
> carefully checked by the application before proceeding. I wrote an
> XML Schema for the config file, in order to get XmlValidatingReader
> to do some of the checking so I wouldn't have to code it
> myself. Using all of the capabilities of XML Schema that I could, I
> was able to reduce the amount of code I had to write from 560 lines
> to 220 lines. But with a few more general capabilities in XML
> Schema, I could have eliminated all of my checking code.

Many thanks for your comments.  I hope that eventually the Working
Group as a whole will respond; in the meantime, here are some
reactions from one WG member speaking only for himself.

> Here is what I found missing:

> 1) I needed a "contiguous" restriction. Yes, I can create sequences,
> and yes, I can create unique and key restrictions, but there was no
> way to say that there must not be gaps. For example, I could have a
> number ranging from 7 to 20, and I could establish a key on that
> number, which made sure that, say, the number 8 wasn't used twice,
> but I couldn't enforce that the numbers had to be sequential (7, 8,
> 9, etc.). It sure would have been useful to be able to say no gaps
> are allowed.

Hmm.  Sounds like an interesting constraint.  But it's not clear to me
how best to enable it in a general and declarative way.  So I'll ask
you in response: can you give a bit more information about the
application that requires this constraint?  (Column mapping in tables,
if we're talking about the same application as in 2 -- but is
it always a constraint that all columns in the input be used?)

Or to go at the problem a different way: is there a general class of
problems that the contiguity constraint you describe seems to you to
be a particular instance of?

> 2) I needed a way to say that, if a number was used, then there had
> to be a keyref to that number. Why? Because my program is mapping
> one set of numbers to another (actually, they're columns in
> tables--I'm mapping input columns to output columns, with no gaps,
> and I need to make sure that every output column is mapped to by at
> least one input column). I couldn't have the numbers 1, 2, 3, and 4,
> and another set of numbers 1, 2, 3, with 4 left out. A general
> notion of ref counts would be really useful. It could have min and
> max ref counts, which would allow all kinds of flexible uses. A ref
> count with a min of 1 and max of 1 would mean that every key must
> have exactly one keyref to it. A ref count with a min of 1 and no
> max would mean that every key must have at least one keyref to it
> (which was the semantics I needed). A ref count with a min of 0
> would mean that a key didn't have to be referred to, and so forth.

I was about to say you can do this, but realize the method I had in
mind doesn't quite do the trick. In the special case of wanting
exactly ONE such keyref, it's possible to enforce the rule by defining
a new pair of identity constraints, in which the old key and keyref
become the new keyref and key.  But I don't know a good way to require
at least one reference, except to supply, as you suggest, reference
count information as part of the PSVI.  That would at least make it
easier for the app when using a validator that exposes that particular
property.

Here, too, I wonder whether there is a more general class of
constraints of which this is one instance.

> 3) I needed a way to say that the max value for some attribute could
> not exceed the value of some other attribute. Generalized, it would
> be really useful to be able to have basic expressions for
> comparision (equal, not equal, greater than, less than, greater than
> or equal, less than or equal) of arbitrary fields in the schema.

Requiring that the values of two attributes stand in some defined
relation to each other is a frequently desired constraint, which in WG
discussions we label 'co-occurrence constraints'.

Some members of the Working Group were already convinced, during the
development of XML Schema 1.0, that such constraints were necessary
and natural, just like table-level CHECK clauses in SQL, which can
express constraints on the values of two or more columns in each
record.  By the analogy with SQL, some of us concluded (I did, anyway)
that the correct way to design such a facility was to use a simple
query language (in SQL, CHECK clauses use the syntax of the WHERE
clause), and that we should therefore delay adding such a feature
until such time as XQuery and XPath 2.0 should be completed.

In the meantime, of course, others have used XPath 1.0 for such
purposes, and developed Schematron on that basis.

> 4) It would be really nice to be able to specify error messages for
> error conditions in the XML Schema. I am currently relying on the
> error messages from the XML reader, but they tend to be pretty
> cryptic. Previously I had written my own messages, which were
> application-specific and quite informative. For example:

>      theLogger.ConfigFileError("Specified config file contains output 
> column heading base name '" + outputHeadingBaseName +
>       "' with forbidden characters (only a-z, A-Z, and 0-9 are 
> allowed).");

> whereas now my application outputs messages like:

> The 'output-heading-base-name' attribute has an invalid value 
> according to its data type. An error occurred at file:///C:/Documents
> and Settings/<filename goes here>, (21, 48).

> I would like to be able to hook my messages into the schema to 
> override the default messages. 

Hmm.  At first glance, this seems like a question of the interface to
the validator you are using; at the worst, you ought to be able to
intercept its diagnostics and substitute your own.  On that line of
reasoning, there's no need for a change in the schema language, just
an improvement in software interfaces.

Of course, it might be convenient to have generic diagnostics as
part of schema annotation, so that violations of particular validity
constraints could be trapped and associated with a particular
error message.  Offhand, it seems likely that one might want a 
WHERE clause to say under what circumstances a particular message
is appropriate, which seems to tie this in with co-occurrence
constraints.  It's also a potential use case for an explicit
fallback mechanism analogous to the one in XSLT -- an xsd:fallback
element in a declaration could be associated with a message supplied
by the schema author.

Wearing a vocabulary designer's hat, this looks cool to me; I don't
know how implementors of schema-aware software will like it.

Thanks again for the comments.

--Michael Sperberg-McQueen, World Wide Web Consortium
Received on Tuesday, 16 August 2005 20:57:35 UTC