- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: 16 Aug 2005 14:53:34 -0600
- To: Jim Showalter <jim@jimandlisa.com>
- Cc: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
- Message-Id: <1124225613.4298.151.camel@localhost>
On Tue, 2005-07-26 at 07:46, Jim Showalter wrote: > [I used XML Schema in a recent project and encountered a few things > that,] if improved, would make it a lot more powerful. > My application has a complicated configuration file that has to be > carefully checked by the application before proceeding. I wrote an > XML Schema for the config file, in order to get XmlValidatingReader > to do some of the checking so I wouldn't have to code it > myself. Using all of the capabilities of XML Schema that I could, I > was able to reduce the amount of code I had to write from 560 lines > to 220 lines. But with a few more general capabilities in XML > Schema, I could have eliminated all of my checking code. Many thanks for your comments. I hope that eventually the Working Group as a whole will respond; in the meantime, here are some reactions from one WG member speaking only for himself. > Here is what I found missing: > 1) I needed a "contiguous" restriction. Yes, I can create sequences, > and yes, I can create unique and key restrictions, but there was no > way to say that there must not be gaps. For example, I could have a > number ranging from 7 to 20, and I could establish a key on that > number, which made sure that, say, the number 8 wasn't used twice, > but I couldn't enforce that the numbers had to be sequential (7, 8, > 9, etc.). It sure would have been useful to be able to say no gaps > are allowed. Hmm. Sounds like an interesting constraint. But it's not clear to me how best to enable it in a general and declarative way. So I'll ask you in response: can you give a bit more information about the application that requires this constraint? (Column mapping in tables, if we're talking about the same application as in 2 -- but is it always a constraint that all columns in the input be used?) Or to go at the problem a different way: is there a general class of problems that the contiguity constraint you describe seems to you to be a particular instance of? > 2) I needed a way to say that, if a number was used, then there had > to be a keyref to that number. Why? Because my program is mapping > one set of numbers to another (actually, they're columns in > tables--I'm mapping input columns to output columns, with no gaps, > and I need to make sure that every output column is mapped to by at > least one input column). I couldn't have the numbers 1, 2, 3, and 4, > and another set of numbers 1, 2, 3, with 4 left out. A general > notion of ref counts would be really useful. It could have min and > max ref counts, which would allow all kinds of flexible uses. A ref > count with a min of 1 and max of 1 would mean that every key must > have exactly one keyref to it. A ref count with a min of 1 and no > max would mean that every key must have at least one keyref to it > (which was the semantics I needed). A ref count with a min of 0 > would mean that a key didn't have to be referred to, and so forth. I was about to say you can do this, but realize the method I had in mind doesn't quite do the trick. In the special case of wanting exactly ONE such keyref, it's possible to enforce the rule by defining a new pair of identity constraints, in which the old key and keyref become the new keyref and key. But I don't know a good way to require at least one reference, except to supply, as you suggest, reference count information as part of the PSVI. That would at least make it easier for the app when using a validator that exposes that particular property. Here, too, I wonder whether there is a more general class of constraints of which this is one instance. > 3) I needed a way to say that the max value for some attribute could > not exceed the value of some other attribute. Generalized, it would > be really useful to be able to have basic expressions for > comparision (equal, not equal, greater than, less than, greater than > or equal, less than or equal) of arbitrary fields in the schema. Requiring that the values of two attributes stand in some defined relation to each other is a frequently desired constraint, which in WG discussions we label 'co-occurrence constraints'. Some members of the Working Group were already convinced, during the development of XML Schema 1.0, that such constraints were necessary and natural, just like table-level CHECK clauses in SQL, which can express constraints on the values of two or more columns in each record. By the analogy with SQL, some of us concluded (I did, anyway) that the correct way to design such a facility was to use a simple query language (in SQL, CHECK clauses use the syntax of the WHERE clause), and that we should therefore delay adding such a feature until such time as XQuery and XPath 2.0 should be completed. In the meantime, of course, others have used XPath 1.0 for such purposes, and developed Schematron on that basis. > 4) It would be really nice to be able to specify error messages for > error conditions in the XML Schema. I am currently relying on the > error messages from the XML reader, but they tend to be pretty > cryptic. Previously I had written my own messages, which were > application-specific and quite informative. For example: > theLogger.ConfigFileError("Specified config file contains output > column heading base name '" + outputHeadingBaseName + > "' with forbidden characters (only a-z, A-Z, and 0-9 are > allowed)."); > whereas now my application outputs messages like: > The 'output-heading-base-name' attribute has an invalid value > according to its data type. An error occurred at file:///C:/Documents > and Settings/<filename goes here>, (21, 48). > I would like to be able to hook my messages into the schema to > override the default messages. Hmm. At first glance, this seems like a question of the interface to the validator you are using; at the worst, you ought to be able to intercept its diagnostics and substitute your own. On that line of reasoning, there's no need for a change in the schema language, just an improvement in software interfaces. Of course, it might be convenient to have generic diagnostics as part of schema annotation, so that violations of particular validity constraints could be trapped and associated with a particular error message. Offhand, it seems likely that one might want a WHERE clause to say under what circumstances a particular message is appropriate, which seems to tie this in with co-occurrence constraints. It's also a potential use case for an explicit fallback mechanism analogous to the one in XSLT -- an xsd:fallback element in a declaration could be associated with a message supplied by the schema author. Wearing a vocabulary designer's hat, this looks cool to me; I don't know how implementors of schema-aware software will like it. Thanks again for the comments. --Michael Sperberg-McQueen, World Wide Web Consortium
Received on Tuesday, 16 August 2005 20:57:35 UTC