- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 12 Dec 2005 20:55:31 -0500
- To: "Jack Lindsey" <tuquenukem@hotmail.com>
- Cc: brs@itst.dk, mike@saxonica.com, petexmldev@tech-know-ware.com, xmlschema-dev@w3.org
I don't want to pursue this discussion too far, because I don't feel sufficiently expert in Schematron at the moment that I should be strongly implying that it's good or bad in any particular context. I had a general awareness that it tends to be implemented using XSTL, which is definitely overhead I don't want to have in the middle of a high performance validator, but I want to research other implementation alternatives before drawing any firm conclusions about it. Anyway, since you've asked a few specific questions I'll try and answer them. Jack Lindsey writes: > Do you not endorse reuse and the orthogonal alignment of W3C offerings? Of course. Whatever the other pros and cons of Schematron, its use of XPath and other existing Recs. is great. > Tim's RDF/Java example does not seem to be about the degree to > which you should use the features of a language but how the > objectives of a language can fundamentally impact your > flexibility when you try to repurpose data instances expressed > in them. I am not seeing the parallel with validation. can > you please enlighten me? Sure. Tim's point is that having something expressed in a more powerful language has drawbacks. Let's take the case of so-called Turing Complete [1] languages including Java, C, JavaScript and XSLT. From that Wikipedia article [1]: "One important result from computability theory is that it is impossible in general to find if a program written in a Turing-complete language will continue executing forever or will stop within a finite period of time (see halting problem) [2]." So, as one example of a concern, we cannot tell whether the transform performed by a given XSLT stylesheet will ever complete. Contrast that with XML Schema as it exists today: while you can build slower or faster implementations, I believe I'm correct in saying that there are known algorithms for doing the validation required by >any< possible schema in a time that is bounded, and usually quite tractable. There is still some work being done on nested occurrence counts, but even those now seem to have practical implementations. Furthermore, Tim's point is that with a less powerful language, you are more likely to be able to reason about a constraint. If I have: <xsd:choice> <xsd:element ref="a"/> <xsd:element ref="b"/> </xsd:choice> it's very easy to build tools that understand that this choice will accept exactly one element, named "a" or "b". With the above constraint, it's easy to build a UI widget that will automatically prompt for an A or a B based on the schema. If, by contrast, I wrote a Java or XSLT program to test for an "a" or a "b", it might be a simple program but it might in principle be a very complex one. It's much harder to read through the code for that Java program or XSLT stylesheet and see what it does. In fact, we can't in general tell mechanically whether an arbitrary Java or XSLT program will even reach the point of making a decision about what's valid. Building the UI widget by reading the program code is similarly difficult. I'm making at least three points here, two of which I'm fairly confident of and one of which I'm not sure of at all: 1. Performance matters in validation. That's my opinion, but I hold it strongly. I want to use validation in high performance web services scenarios, and I believe I know how to do it by optimizing today's schema language. I want to retain that capability when and if we add co-constraints. 2. I think the principle of least power is important, for some of the reasons explained above. It will be better to find a simple declarative approach that hits an 80/20 point than to implement something that's more general but hard to reason about. 3. (potentially unfounded suspicion) Maybe Schematron suffers from being too powerful and too slow, or maybe it doesn't. This is the one I'm not sure of at all, and I want to learn more before casting aspersions. Certainly a generalized XSLT implementation is way slower than I want while processing hundreds or thousands of SOAP messages per second. I have skimmed the Schematron spec, but the problem with least power arguments is that even one small feature off in the corner can make you Turing complete or otherwise difficult to reason about. I don't know Schematron well enough to be sure. Thanks. Noah [1] http://en.wikipedia.org/wiki/Turing_complete [2] http://en.wikipedia.org/wiki/Halting_problem -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Tuesday, 13 December 2005 01:55:43 UTC