- From: Steven Ericsson-Zenith <steven@semeiosis.com>
- Date: Thu, 23 Jun 2005 14:40:26 -0700
- To: xmlschema-dev@w3.org
- Cc:
Here is an informal "amicus curiae" contribution to the XMLSchema committee - notes from the workshop of the past couple of days. It was hard not to come across as a formalist at this workshop - I do understand the pragmatics of formal language development and I want to clarify some of my concerns. There are two ways to view these pragmatics - one is the long term pragmatics of refinement and the second is the short term pragmatics of necessity. An example of a short term pragmatic is the necessity to produce a result - in the case of a working group, the production of a specification. Typing questions, IMHO, is another example of a short term pragmatic - who can doubt the necessity that precision decimal should be supported or that date and time stamps should cover the scope of those specified in SQL? In the absence of a mechanism to specify the base types of a schema it would be an error not to support these types IMHO. So this could be solved immediately with an errata that extends the base types. [As an aside the base type mapping problem is identical conceptually to the problem of binding schema types to application types. So a single solution that solves both problems should be proposed.] If your horizon is long term - as I said at the workshop, I want schemas (or "schemata") that I write to be valid 40 or 100 years for now - then base type specification and a formal method for binding types between application implementations seems essential. An example of a long term pragmatic is the necessity of refinement and this was expressed by everyone at the workshop under the term "versioning" and manifests for the committee in the need to release subsequent specifications of the XMLSchema standard. In what many considered another example of a short term pragmatic, is what many saw as the need to specify "profiles" - subsets of the specification that could be guaranteed in specific use-cases. I will argue that this is a long term consideration also. I spoke briefly in my presentation about my interest in concept distinction, and this is another case in point. The concepts of versioning and profiling are essentially the same - and can thus be addressed by the same solution. Therefore, I would strongly urge the committee not invent specialized solutions for what appear initially to be distinct concepts. I also want to clarify what I mean by formal specification and what others may mean. When engineers ask for a formal specification they do not necessarily need a Zed specification. While the computer science formal methods community has gone down the road of building new algebras it is by no means a necessity that a formal specification be entirely written in what has essentially become a private language. John von Neumann and David Hilbert used informal language too - the tendency toward strict private langauges is a relatively recent phenomenon - one that manifestly has not served computer engineering well since it has built an unneccessary divide between formalists and engineers. It is perfectly possible - and pragmatically necessary - to write a formal specification that engineers building tools can use. The specification does need a mathematical foundation but it is not always necessary that users of the specification appreciate that foundation. We have known how to do this since the Algol 60 report led the way, written almost 50 years ago. As I read the existing specification it is apparent that the authors did intend to write a formal specification of the type I describe but, it seems to me, that the mathematical foundations of the specification are unclear - perhaps absent. Which is what I meant when I expressed at the workshop that the specification was from my POV "insufficiently formal." XMLSchema is not an imperative programming language so the Algol 60 report does not help us much - but it does seem possible to build the spec of a constrained data desciption language on mathematical foundations none-the-less. I could sense the frustration in the Committee whenever I pushed for more formalization and it should be clear that whatever the committee's experience is with formal methods, it would be a fundamental mistake to dismiss these methologies because of this experience. The issue is not whether the language should be specified formally - but rather how to specify it formally. If past attempts have failed then it is not a failure of the method but a simple failure of communication between individuals. Three attempts to write the spec formally is no reason not to write a fourth - and perhaps 1.1 can be that specification. So, on review, here is a summary of my recommendations: 1. In version 1.1 specify a binding mechanism that permits base types to be specified and use this mechanism to specify the 1.0 base types as base types in 1.1. This mechanism would also enable general binding of types between schemas and applications. 2. In version 1.1 specify a profiling mechanism that permits a guarantee in a schema - the guarantee is that the semantics of the specified subset of the named schema will never change. This could, perhaps, be implemented by an attribute on types that says the type cannot be redefined. 3. Specify using the mechanism of (2) a profile of 1.1 that is the 1.0 specification and any other 1.0 profile - such as that proposed for WSDL etc.. 4. I support the call for constraints in the langauge (commonly called "co-constraints"). In my presentation I pointed out that there are two types of constraints. Essentially, they are those that apply to the generator and specify whether an instance is valid, and those that apply to the data in valid instances used by a consumer and essentially specify rules that apply to data. An example of the first case: to ensure that a value is in a range - a value out of the range produces an invalid schema. Similarly, I want to ensure that the timestamp in a given field is earlier than the timestamp in other fields - otherwise the schema is invalid. I also asked for a strong type inference that timestamps are state not declaration. This type of inference also applies to all calculations for example, summation. An example of the second case - to use the example given by BT - is in a valid instance the value of a purchase order field requires the sign off appears in an associated field. In this case the instance is valid and the consumer needs to see the rule. 5. Finally, I pointed out that there is an error in the specification of recursive types. The tail of the recursion must read maxOccurs=0, otherwise you can specify infinite recursion in a schema. This is clearly an error. With respect, Steven -- Dr. Steven Ericsson Zenith http://www.semeiosis.com
Received on Thursday, 23 June 2005 21:40:33 UTC