[REVISED] My user experience of the user experience workshop from Steven Ericsson-Zenith on 2005-06-27 (www-xml-schema-comments@w3.org from April to June 2005)

From: Steven Ericsson-Zenith <steven@semeiosis.com>
Date: Mon, 27 Jun 2005 13:25:52 -0700
To: www-xml-schema-comments@w3.org, xmlschema-dev@w3.org
Cc:
Message-Id: <1119903952.13357@memeiosys.com>
I had posted this originally to the xmlschema-dev list, and 
Henry Thompson has asked me to post it to 
www-xml-schema-comments, to the official comments list. 

I want to add that since I wrote the note below I have read 
the formal specification and my feeling about that document 
is that it does more harm than good - especially since the 
committee made it clear at the workshop that the document is 
not considered a valid account of the standard. This is all
the more concern since I heard XPath and XQuery made use of
the spec.

I guess I am puzzled as to why the committee did not follow
the precedent set by the XML standard and wonder what the
W3C broad position is - surely a recomendation regarding 
formal specification for all the standards is appropriate.
A common mathematical basis and algebra in the standards 
would seem useful to me.

Also, to clarify my comment below about mathematical basis.
In a specification of any kind there is "something that it is 
about."  A formal specification takes extra steps to 
clarify that "something that it is about" - it's premises 
or axioms need to be clearly stated.

I believe that one reason that there is confusion here is that
the formal description has not taken sufficient steps to ensure
its mathematical basis. Forgive my semeiotic point of view,
but it is heavy on description of syntax and translation between
two poorly defined models, and light on semantics and pragmatics 
- and these three are not clearly distinquished.

I do not have sufficient time currently to do more than a 
cursory review of the inference rules. So I want to be
cautious about being overly critical. However, they appear,
simply, to be substitution rules and maps between XML and a 
DOM - which I guess is no surprise. 

However, I do not see the specification of a valid schema 
instance - i.e., what it means for an instance to be
valid against a given schema.  I am prepared to accept that
I simply have not given the spec sufficient time - so perhaps
someone can point it out for me.

My original note follows, with minor corrections:

Within the limits of time a couple of errors crept into my report
that I wish to correct and I missed one issue that I want to add.
I have also made a few typo corrections and clarifications.

The error relates to my report of the recursive type issue. That
should read that in recursive types the tail must be minOccurs=0
otherwise you have specified an infinitely recursive data 
structure.

Here is my revised and informal "amicus curiae" contribution to 
the XMLSchema committee - notes from the workshop of the past 
couple of days. My apologies for the limited time I have currently
to detail these issues further or to make my report more readable.

It was hard not to come across as a formalist at this workshop.
>From my experience, I do understand the pragmatics of formal 
language development I empathise with the challenge, and I want 
to clarify some of my concerns. 

There are two ways to view the pragmatics of language development
- one is the long term pragmatics of refinement and the second is 
the short term pragmatics of necessity.

An example of a short term pragmatic is the necessity to produce
a result - in the case of a working group, the production of a
working specification. 

Typing questions, IMHO, is another example of a short term 
pragmatic - who can doubt the necessity that precision decimal 
should be supported or that date and timestamps should cover
the scope of those specified in SQL?  In the absence of a 
mechanism to specify the base types of a schema it would be an 
error not to support these types IMHO. So this could be solved 
immediately with an errata that extends the base types.

[As an aside the base type mapping problem is identical
conceptually to the problem of binding schema types to 
application types. So a single solution that solves both 
problems should be proposed.]

If your horizon is long term - as I said at the workshop, 
I want schemas (or "schemata") that I write to be valid 40 
or 100 years from now - then base type specification and a 
formal method for binding types between schema and 
application implementations seems essential.

An example of a long term pragmatic is the necessity of 
refinement and this was expressed by everyone at the 
workshop under the term "versioning" and manifests for
the committee in the need to release subsequent 
specifications of the XMLSchema standard.

In what many considered another example of a short
term pragmatic, is the need to specify "profiles" - 
subsets of the specification that could be
guaranteed in specific use-cases. I will argue that
this is a long term consideration also.

I spoke briefly in my presentation about my interest 
in concept distinction, and this is another case in point. 
The concepts of versioning and profiling are essentially 
the same - and can thus be addressed by the same solution.  

Therefore, I would strongly urge the committee not 
invent specialized solutions for what appear initially 
to be distinct concepts.

I also want to clarify what I mean by formal specification 
and what others may mean. When engineers ask for a formal 
specification they do not necessarily need a Zed 
specification. While the computer science formal methods 
community has gone down the road of building new algebras 
it is by no means a necessity that a formal specification 
be entirely written in what has essentially become a 
private language.

[I note that in the formal specification that 
this issue is amplified by using a different and compressed
syntactic convention. ] 

John von Neumann and David Hilbert used informal language
too - the tendency toward strict private langauges is
a relatively recent phenomenon - one that manifestly has
not served computer engineering well since it has built
an unneccessary divide between formalists and engineers. 

It is perfectly possible - and pragmatically necessary - 
to write a formal specification that engineers building 
tools can use.  The specification does need a mathematical 
foundation but it is not always necessary that users of the 
specification appreciate that foundation.  We have known
how to do this since the Algol 60 report led the way,
written almost 50 years ago.

As I read the existing specification it is apparent that
the authors did intend to write a formal specification 
of the type I describe but, it seems to me, that the 
mathematical foundations of the specification are unclear
- perhaps absent. Which is what I meant when I expressed 
at the workshop that the specification was from my POV
"insufficiently formal."

XMLSchema is not an imperative programming language so
the Algol 60 report does not help us much - but it does
seem possible to build the spec of a constrained data
desciption language on mathematical foundations
none-the-less.

I could sense the frustration in the Committee whenever 
I pushed for more formalization and it should be clear
that whatever the committee's experience is with formal 
methods, it would be a fundamental mistake to dismiss 
these methodologies because of this experience. 

The issue is not whether the language should be specified 
formally - but rather how to specify it formally.  If past 
attempts have failed then it is not a failure of the 
method but a simple failure of communication between 
individuals. Three attempts to write a spec that meets the
community needs is no reason not to write a fourth if
the last is found wanting - and perhaps 1.1 can be that 
specification.

Noah is rightly concerned about new articulations because he
fears that the new work will make unintended contradiction
with the old work.  If this really is a concern - and it is
really that difficult for a skilled individual to reproduce 
the current specification then that seems to me to be 
clear evidence that the need is more urgent, for how on
earth is a can we expect tools writer to fair better?

So, on review, here is a summary of my recommendations:

1. In version 1.1 specify a binding mechanism that permits
base types to be specified and use this mechanism to specify
the 1.0 base types as base types in 1.1. This mechanism would
also enable general binding of types between schemas and
applications.

2. In version 1.1 specify a profiling mechanism that permits
a guarantee in a schema - the guarantee is that the semantics
of the specified subset of the named schema will never change.
This could, perhaps, be implemented by an attribute on types 
that says the type cannot be redefined. 

3. Specify using the mechanism of (2) a profile of 1.1 that is
the 1.0 specification and any other 1.0 profile - such as that
proposed for WSDL etc..

This profiling mechanism provides your versioning solution 
since now you can specify future refinements of a namespace
in terms of the past versions of the namespace.

4. I support the call for constraints in the langauge 
(commonly called "co-constraints").  In my presentation 
I pointed out that there are two types of constraints.
Essentially, they are those that apply to the generator
and specify whether an instance is valid, and those that
apply to the data in valid instances used by a consumer
and essentially specify rules that apply to data.

An example of the first case: to ensure that a value is 
in a range - a value out of the range produces an invalid 
schema. Similarly, I want to ensure that the timestamp in 
a given field is earlier than the timestamp in other 
fields - otherwise the schema is invalid. I also asked
for a strong type inference that timestamps are state not 
declaration. This type of inference also applies to all 
calculations for example, summation.

An example of the second case - to use the example given 
by BT - is in a valid instance the value of a purchase
order field requires the sign off appears in an associated 
field. In this case the instance is valid and the consumer
needs to see the rule.

5. Instance trees. It is useful in instances to reference
other instances of the same schema - for cases where part 
of the instance changes infrequently and another part
changes more frequently.

6. Finally, I pointed out that there is an error in the 
specification of recursive types. The tail of the recursion 
must read minOccurs=0, otherwise you can specify infinitely
recursive data structures in a schema - and this is clearly 
an error.

With respect,
Steven

--
Dr. Steven Ericsson Zenith
http://www.semeiosis.com
Received on Monday, 27 June 2005 20:25:59 UTC