- From: Steven Ericsson-Zenith <steven@semeiosis.com>
- Date: Mon, 27 Jun 2005 13:25:52 -0700
- To: www-xml-schema-comments@w3.org, xmlschema-dev@w3.org
- Cc:
I had posted this originally to the xmlschema-dev list, and Henry Thompson has asked me to post it to www-xml-schema-comments, to the official comments list. I want to add that since I wrote the note below I have read the formal specification and my feeling about that document is that it does more harm than good - especially since the committee made it clear at the workshop that the document is not considered a valid account of the standard. This is all the more concern since I heard XPath and XQuery made use of the spec. I guess I am puzzled as to why the committee did not follow the precedent set by the XML standard and wonder what the W3C broad position is - surely a recomendation regarding formal specification for all the standards is appropriate. A common mathematical basis and algebra in the standards would seem useful to me. Also, to clarify my comment below about mathematical basis. In a specification of any kind there is "something that it is about." A formal specification takes extra steps to clarify that "something that it is about" - it's premises or axioms need to be clearly stated. I believe that one reason that there is confusion here is that the formal description has not taken sufficient steps to ensure its mathematical basis. Forgive my semeiotic point of view, but it is heavy on description of syntax and translation between two poorly defined models, and light on semantics and pragmatics - and these three are not clearly distinquished. I do not have sufficient time currently to do more than a cursory review of the inference rules. So I want to be cautious about being overly critical. However, they appear, simply, to be substitution rules and maps between XML and a DOM - which I guess is no surprise. However, I do not see the specification of a valid schema instance - i.e., what it means for an instance to be valid against a given schema. I am prepared to accept that I simply have not given the spec sufficient time - so perhaps someone can point it out for me. My original note follows, with minor corrections: Within the limits of time a couple of errors crept into my report that I wish to correct and I missed one issue that I want to add. I have also made a few typo corrections and clarifications. The error relates to my report of the recursive type issue. That should read that in recursive types the tail must be minOccurs=0 otherwise you have specified an infinitely recursive data structure. Here is my revised and informal "amicus curiae" contribution to the XMLSchema committee - notes from the workshop of the past couple of days. My apologies for the limited time I have currently to detail these issues further or to make my report more readable. It was hard not to come across as a formalist at this workshop. >From my experience, I do understand the pragmatics of formal language development I empathise with the challenge, and I want to clarify some of my concerns. There are two ways to view the pragmatics of language development - one is the long term pragmatics of refinement and the second is the short term pragmatics of necessity. An example of a short term pragmatic is the necessity to produce a result - in the case of a working group, the production of a working specification. Typing questions, IMHO, is another example of a short term pragmatic - who can doubt the necessity that precision decimal should be supported or that date and timestamps should cover the scope of those specified in SQL? In the absence of a mechanism to specify the base types of a schema it would be an error not to support these types IMHO. So this could be solved immediately with an errata that extends the base types. [As an aside the base type mapping problem is identical conceptually to the problem of binding schema types to application types. So a single solution that solves both problems should be proposed.] If your horizon is long term - as I said at the workshop, I want schemas (or "schemata") that I write to be valid 40 or 100 years from now - then base type specification and a formal method for binding types between schema and application implementations seems essential. An example of a long term pragmatic is the necessity of refinement and this was expressed by everyone at the workshop under the term "versioning" and manifests for the committee in the need to release subsequent specifications of the XMLSchema standard. In what many considered another example of a short term pragmatic, is the need to specify "profiles" - subsets of the specification that could be guaranteed in specific use-cases. I will argue that this is a long term consideration also. I spoke briefly in my presentation about my interest in concept distinction, and this is another case in point. The concepts of versioning and profiling are essentially the same - and can thus be addressed by the same solution. Therefore, I would strongly urge the committee not invent specialized solutions for what appear initially to be distinct concepts. I also want to clarify what I mean by formal specification and what others may mean. When engineers ask for a formal specification they do not necessarily need a Zed specification. While the computer science formal methods community has gone down the road of building new algebras it is by no means a necessity that a formal specification be entirely written in what has essentially become a private language. [I note that in the formal specification that this issue is amplified by using a different and compressed syntactic convention. ] John von Neumann and David Hilbert used informal language too - the tendency toward strict private langauges is a relatively recent phenomenon - one that manifestly has not served computer engineering well since it has built an unneccessary divide between formalists and engineers. It is perfectly possible - and pragmatically necessary - to write a formal specification that engineers building tools can use. The specification does need a mathematical foundation but it is not always necessary that users of the specification appreciate that foundation. We have known how to do this since the Algol 60 report led the way, written almost 50 years ago. As I read the existing specification it is apparent that the authors did intend to write a formal specification of the type I describe but, it seems to me, that the mathematical foundations of the specification are unclear - perhaps absent. Which is what I meant when I expressed at the workshop that the specification was from my POV "insufficiently formal." XMLSchema is not an imperative programming language so the Algol 60 report does not help us much - but it does seem possible to build the spec of a constrained data desciption language on mathematical foundations none-the-less. I could sense the frustration in the Committee whenever I pushed for more formalization and it should be clear that whatever the committee's experience is with formal methods, it would be a fundamental mistake to dismiss these methodologies because of this experience. The issue is not whether the language should be specified formally - but rather how to specify it formally. If past attempts have failed then it is not a failure of the method but a simple failure of communication between individuals. Three attempts to write a spec that meets the community needs is no reason not to write a fourth if the last is found wanting - and perhaps 1.1 can be that specification. Noah is rightly concerned about new articulations because he fears that the new work will make unintended contradiction with the old work. If this really is a concern - and it is really that difficult for a skilled individual to reproduce the current specification then that seems to me to be clear evidence that the need is more urgent, for how on earth is a can we expect tools writer to fair better? So, on review, here is a summary of my recommendations: 1. In version 1.1 specify a binding mechanism that permits base types to be specified and use this mechanism to specify the 1.0 base types as base types in 1.1. This mechanism would also enable general binding of types between schemas and applications. 2. In version 1.1 specify a profiling mechanism that permits a guarantee in a schema - the guarantee is that the semantics of the specified subset of the named schema will never change. This could, perhaps, be implemented by an attribute on types that says the type cannot be redefined. 3. Specify using the mechanism of (2) a profile of 1.1 that is the 1.0 specification and any other 1.0 profile - such as that proposed for WSDL etc.. This profiling mechanism provides your versioning solution since now you can specify future refinements of a namespace in terms of the past versions of the namespace. 4. I support the call for constraints in the langauge (commonly called "co-constraints"). In my presentation I pointed out that there are two types of constraints. Essentially, they are those that apply to the generator and specify whether an instance is valid, and those that apply to the data in valid instances used by a consumer and essentially specify rules that apply to data. An example of the first case: to ensure that a value is in a range - a value out of the range produces an invalid schema. Similarly, I want to ensure that the timestamp in a given field is earlier than the timestamp in other fields - otherwise the schema is invalid. I also asked for a strong type inference that timestamps are state not declaration. This type of inference also applies to all calculations for example, summation. An example of the second case - to use the example given by BT - is in a valid instance the value of a purchase order field requires the sign off appears in an associated field. In this case the instance is valid and the consumer needs to see the rule. 5. Instance trees. It is useful in instances to reference other instances of the same schema - for cases where part of the instance changes infrequently and another part changes more frequently. 6. Finally, I pointed out that there is an error in the specification of recursive types. The tail of the recursion must read minOccurs=0, otherwise you can specify infinitely recursive data structures in a schema - and this is clearly an error. With respect, Steven -- Dr. Steven Ericsson Zenith http://www.semeiosis.com
Received on Monday, 27 June 2005 20:26:03 UTC