Re: Conditional Levels of a Schema from XML4Pharma on 2009-04-07 (xmlschema-dev@w3.org from April 2009)

From: XML4Pharma <info@XML4Pharma.com>
Date: Tue, 7 Apr 2009 17:54:48 +0200
To: "Michael Kay" <mike@saxonica.com>, "'Dieter Menne'" <dieter.menne@menne-biomed.de>, <xmlschema-dev@w3.org>
Message-ID: <0D338135825B477380B0F9546AC47F35@D6NXTQ1J>
Dear Michael,

In our development team (CDISC ODM standard) we had the same discussions and 
issues, but we decided a bit differently - each time we change the schema 
(new version, every 2-3 years) we also give it a new namespace.
Our standard is downwards compatible, so the changes required  (in software 
that implements the standard)  with the new version are very small (a few 
lines of code, the ones that define the namespace).

For your point a), I do not know whether your example is HL7-v3-XML.
This is also something I have been looking into in the last months.
My personal opionion is that the common (base) elements/attributes that can 
be reused in all 400 messages should live in a separate namespace, and that 
the main schemas for the individual messages should each have their own 
namespace, but reference the base elements/attributes, meaning that these 
will get a prefix in the instance documents. This also allows for different 
use of the base elements/attributes depending on which of the message: an 
attribute that is mandatory in one message can be optional in another.
Yes, it is more work - but much cleaner.

My principles in schema writing are based on the underlying set of 
principles:
- validation by the schema as much as possible and desirable
- if that does not work anymore, or rules cannot be expressed in schema, use 
schematron
- only in very last instance, when all this does not work anymore, write 
software to implement validation of rules in the standard.

The reason for these is that writing software to implement/validate rules is 
always considerably more expensive, and intransparent - usually the source 
code is not published.
So for open standards, writing of validation software should be avoided as 
much as possible, or it should be that the software just implements the 
schema and schematron validation. If possible, the rules should not be 
implemented in the software, but only in the schema and schematron.

One can, in some cases, also write software that acts independently of the 
version of the standard, even when the schemas for the different versions 
have different namespace names. In the past, I wrote a Clinical Study 
Designer, which first reads the XML-Schema and generates all the GUI 
elements from the information in the schema. The great advantage is that 
when a new version of the standard comes out, the software is immediately 
fit for it - no need or very little need for adaptions. Furthermore, it also 
generates widgets for working with extensions of the schema (which is 
allowed in our standard, as long as the extension elements/attributes live 
in  a separate namespace).
When you do it like this, there is no issue with code reuse.

Of course we do not need to agree on all this - but I find this discussion 
extremely interesting ...

With best regards,

Jozef Aerts
XML4Pharma



----- Original Message ----- 
From: "Michael Kay" <mike@saxonica.com>
To: "'XML4Pharma'" <info@XML4Pharma.com>; "'Dieter Menne'" 
<dieter.menne@menne-biomed.de>; <xmlschema-dev@w3.org>
Sent: Tuesday, April 07, 2009 5:08 PM
Subject: RE: Conditional Levels of a Schema


>> but, if I do understand it well, this means that you have two
>> different (versions of the) schemas, with the same namespace,
>> and different (although slightly different) content.
>>
>> This is something, just from a principal point of view, I do not like.
>> My principle is "new standard (version) => new schema
>> (version) => new namespace".
>
> This is a very important question.
>
> I've come to the conclusion that we do need multiple schemas for the same
> namespace, for a variety of reasons:
>
> (a) an organisation defines 400 message types for exchanging data between
> different applications. There are many data elements shared between these
> messages. It would greatly restrict reuse of code to have a different
> namespace for each message type. Yet the validation rules are different: a
> field that is optional in one message may well be mandatory in another.
>
> (b) Different validation rules apply to the same document at different
> stages in its life-cycle. You don't want to apply the same level of
> validation to an internal draft document as you do to a final published
> document. Yet both have to use the same namespace.
>
> (c) The schema evolves. I don't believe it is practical to change the
> namespace every time the schema changes - again, because that inhibits 
> code
> reuse. You want to be able to evolve gracefully, which means for example
> that when you expand the range of values allowed for an attribute, 
> existing
> code continues to work provided the newly permitted values do not appear 
> in
> the instance, and might even work in the presence of the new value, if the
> code was carefully written. Changing the namespace means that everyone has
> to change their code at once, which simply doesn't work.
>
> So the problem is that to identify a schema component, knowing the 
> namespace
> (and local name) isn't enough. There needs to be some other handle to
> identify the "version" or "variant" we are after. I would like to see this
> formalized, so that different versions/variants of the same schema 
> component
> can co-exist. At the moment the only identifier available is the schema
> location, which is very weak for two reasons - (1) it's an address rather
> than a name, and (2) the specs are full of stuff about it only being a 
> hint.
>
> Michael Kay
> http://www.saxonica.com/
>
>
>
Received on Tuesday, 7 April 2009 15:55:33 UTC