Five mechanical approaches to make an XSD profile without getting bogged by individual issues

David Exellwrote:

  In essence, XML Schema 1.1 addresses the issues from the workshop.

I would say that in essence XSD 1.1 ignores the main issue from the 
workshop.

When I look at the Chair's report (linked to by David) I read:

 > There was significant support for the idea of a written ‘profile’ of 
XML Schema
 > which would document the sweet spot for purposes of data binding, or 
for other
 > specific domains. The word /profile/ is problematic; what was meant 
was not a
 > language subset, but only a definition of the sweet spot in existing 
processors,
 > which would allow schema authors to get better results and better 
user experience
 > when data binding tools are used, and which would tell implementors 
in the
 > relevant domain which parts of schema users are most likely to expect 
them to
 > support well.

 > There was strong sentiment against publishing any profiles which 
would restrict
 > or reduce the XML Schema 1.0 specification, impacting existing 
implementations
 > or vocabularies.

My request for a profile does not reduce or restrict or otherwise define 
what is in
full XML Schema 1.n.

So what happened to this "significant support"?

The key is in the next paragraph:

 > There appeared to be no obvious way to split the XML Schema 
specification
 > into layers or sub-languages, as with OWL Lite, DL and Full or SVG Tiny,
 > Basic and Full. Accordingly, there was no support for trying to 
define profiles
 > of XML Schema as part of the schema language itself. However, many 
people
 > saw value in application or domain specific 'profiles', in particular 
identifying
 > a set of schema patterns to provide a 'good user experience' when using
 > XML Schema 1.0 to bind XML to code or data models.

And how long was this discussion that decided that there was "no obvious 
way"?
Well, the formal discussion on this seems to have occupied 15 minutes of 
time,
which ran out before discussions had finished. Indeed, as far as I can 
see, no
straw proposals were asked for, raised, considered or dispatched.

For my rather immoderate response to that event, see my blog item from 
the time:
Snow Season in Schemaland
http://blogs.oreilly.com/digitalmedia/2005/07/snow-season-in-schemaland.html

So what are obvious ways? Here are five:

-----------------------------------------------------------------------------------
1) Exchange model

One of the biggest early success stories in vendor-cooperative standards 
setting
was the OASIS CALS Exchange Table Model: now it is part of history though it
has influenced all subsequent table models since. Michael, David, Norm 
and the
other old-timers will certainly remember it. The military CALS table model
was based on going through all the tables in the archive and making a 
schema (DTD)
that could cope with them all. It supported lots of fancy things (tables 
on call-out
pages with different page size, etc). Most vendors could only support a 
subset.

So they got together, and rather than dispute each feature, they agreed 
on an algorithm:
where almost all vendors supported a feature, it would be kept and the 
vendors
would agree to support it; otherwise it would be dropped.

There are now several profiles out: the W3C databinding minimum and 
maximum,
the WS-I profile, the UN profile, etc. An algorithmic approach like the 
CALS
approach could be used.

-----------------------------------------------------------------------------------
2) Modularity model

Chop the 250 page Structures plus the datatypes specs into different 
severable parts:

1) Grammars and particles
1a) Additional constraints
2) Key and uniqueness
3) Assertions
4) Built-in Datatypes
5) Schema location and assembly
6) Complex type derivation and assembly
7) Simple derivation
8) Dynamic schema constructs: xsi:nil, xsi:type, version selection
9) PSVI

and encourage implementators to implement fully each part that they 
implement.

-----------------------------------------------------------------------------------
3) Set-based selection

1) Start with a private syntax for ISO/OASIS RELAX NG using 
XSD-namespace elements.
(Call it RELAXSD) This gives a solid theoretical basis and proven 
capabilities with little work.
2) Create an extra layer of syntax and semantic checking on RELAXSD 
(Call it XSD Lite
and Tite) to implement the appropriate rules of XSD 1.n and remove 
patterns specified
in the maximum W3C databinding note.
3) Adjust RELAXSD to remove any syntax that is removed by XSD Lite and 
Tite if necessary.
(Call this XSD Lite)

The result:
* all XSD Lite documents can be trivially converted to RELAX NG
* all XSD Lite and Tite documents are conforming XSD 1.n documents
* all XSD Lite and Tite documents are usable by XSD Lite systems.

XSD Lite would meet the needs of those for whom ambiguity is not an issue.
XSD Lite and Tite would meet the needs for those for whom ambiguity was 
an issue.
Both would be fairly equivalent to DTDs with simple types.

Neither would use the bogus complex type *derivation* apparatus, though 
they certainly
could be declared as a name binding to a complexType, and they could be 
imported
and used in a full XSD 1.n system that had complex typing.

-----------------------------------------------------------------------------------
4) Resolved schemas

Many of the features of XSD are syntactic sugar. They may be useful for 
modelling, but they
do not actually add any expressive power. And they come at a heavy cost.

A resolved schema would be one in which a full XSD 1.n schema had been 
re-written to
remove syntactic sugar (such as element substitution and complex type 
derivation by extension,)
and modeling items (such as complex type derivation by restriction and 
abstract elements.)

In fact, this is how the RELAX NG specification is written: first a 
transform to resolve
the sugar and then formal description of the remaining core. It is also 
how I implemented
my XSD validator, which converts to Schematron.

-----------------------------------------------------------------------------------
5) Schema versus Instance validation

When implementing XSD it becomes obvious that there are two very 
different kinds of constraints
involved. They can be seen starkly in the test suite: some tests require 
an instance, some do not.

The specification could be refactored into two parts:

1) Validation that the XSD schema is correct
2) Validation that an instance is correct against the XSD schema.

For example, my implementation largely assumes that the schema is 
correct. This represents a major
simplification in the work involved.

For example, I suggest that of the implementers who require UPA, there 
are many who would
prefer (and perhaps don't) check the schema for UPA and just rely on 
runtime violations if any.

-----------------------------------------------------------------------------------

6) Implementation caused

Create a profile which removes any features that have been shown to have 
caused
implementation problems. The W3C databinding profiles are relevant, 
though the
metric is not "what has been implemented?" but "what has been implemented
badly/with difficulty/wrongly/abandoned? I.e. some features may be abandoned
because of mismatch with an underlying model: this is no reason to ditch 
them
under this method. But a feature that perhaps was needed and missed the mark
could be ditched. (This is more like my original suggestion in my 
submission to
the W3C Workshop.)

I think this method is now superceded by events and information and does not
need to be considered.
------------------------------------------------------------------------------------

Each of these 5 methods would allow a mechanical 
split/layering/refactoring/profiling
of the standard.

They obviously each have their pros and cons (I would be happy with any 
of them). And the
shape of the final profile would be pretty much determined and knowable 
upfront by the
mechanism chosen: the judgements are not matters of expertise or 
subjectivity.

I would like to note that I do not believe the XSD WG has ever called for
submissions on how to refactor or profile the spec. So I don't believe 
that they
have indeed addressed a main issue from the W3C Workshop. In fact, the
birthing baby has gone straight into the too-hard basket without even
a slap, to mix metaphors.

Cheers
Rick Jelliffe

Received on Thursday, 21 May 2009 04:38:38 UTC