Re: Conditional Levels of a Schema from C. M. Sperberg-McQueen on 2009-04-07 (xmlschema-dev@w3.org from April 2009)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Mon, 6 Apr 2009 22:08:09 -0600
To: Dieter Menne <dieter.menne@menne-biomed.de>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, xmlschema-dev@w3.org
Message-Id: <A9AB9ED8-3C7F-4273-9BD8-6685550ED335@blackmesatech.com>
On 2 Apr 2009, at 12:05 , Dieter Menne wrote:

> Hi,
>
> we are currently defining a format for medical data storage
> (hrmconsensus.org). The full version is available
> http://hrmconsensus.org/media/hrm/xhrm/xhrm02/xhrm0_2.xsd here .
>
> In the simplified example below, we have the always mandatory  
> deviceTyp. For
> patientsType, we would like to have a global conditional switch so  
> that
> three flavors are possible
>
> -- minOccurs = "0" for internal clinical use
> -- minOccurs = "1" for archiving, must contain patient info
> -- minOccurs = "never" anonymized, must not contain patient info

I may be being dense, but it's not clear to me what your requirement
is.  Is it that

(A) You want the internal clinical systems to use a schema with

   <xs:element name="patients" type="patientsType" minOccurs="0"/>

while the archival system uses

   <xs:element name="patients" type="patientsType" minOccurs="1"/>

while tools and data flows for anonymized data should use

   <xs:element name="patients" type="patientsType" maxOccurs="0"/>

?  In other words, you want to work with three related but different
schemas?

Or is it that

(B) based on some signal in the XML, the 'patients' element must occur,
must not occur, or may occur?

You don't seem to mention any visible signal in the XML, so I'm
guessing it's not B.


> I know that the latter is not possible, that conditionals are not  
> supported
> in XSL,

I'm not sure what you mean by that.  There are many conditions one
can check with the subset of regular languages which XSD uses for
content models.  It's true that to check conditions with a content
model you may need to write the content model in a particular way.

> and that Schematron would be an alternative.  Note that the
> conditionals occur in several nesting levels, so that we cannot easily
> combine versions of a master element with details, but they are  
> always of
> the type "may", "must", "must not".

I'm not sure what you mean by this.

> We would like to avoid having several xsd files and prefer a common  
> file
> with branching.

Is this (a) in order to avoid redundancy and eliminate the problem
of inconsistent updates during maintenance of the schema document(s)?
Or (b) because there are some important consumers of your work (maybe
potential users, maybe your bosses, maybe ISO Pascal programmers) who
might, you suspect, find it too hard to grasp the idea of a schema
made up by consulting more than one file at schema construction time?
Or (c) because you have no control over the schema processors to
be used with this schema, and you do not believe that xsd:include
is sufficiently interoperable to be relied upon? (d) Because
you believe in your hearts that you are defining a single language
here, and you want to make that fact manifest by producing a single
schema document?  (In this case, there is the troubling fact that
the 'patients' element follows three different syntactic rules based
not on syntactic context but based on application context, which
suggests that formally speaking you really are defining not one
language, but three.) (e) for some other reason?

Any of these can be a plausible reason (so forgive me if my tone
seems flippant or dismissive -- no offense to you intended), but
what you need to do may vary a lot depending on which reason you have.

> Any ideas or references to ideas are appreciated.

Some possibilities that occur to me off the top of my head.

(1) You single-source the schema document using a literate
programming system (or a macro processor).  So you have eliminated
the inconsistent-maintenance problem.  From your single source
you generate three schema documents, called clinical.xsd,
archival.xsd, and anonymized.xsd.  The appropriate tools and
systems use the appropriate schema document.

The suggestions made by Michael Kay and Pete Cordell both fall
into this category, I think.

(2) A particular variant of the preceding.  In the main schema
document, the relevant declaration reads

   <xs:element name="patients" type="patientsType"
       minOccurs="&patients.minOccurs;"
       maxOccurs="&patients.maxOccurs;"
   />

And the document begins

   <!DOCTYPE xs:schema SYSTEM ... >

By whatever means you choose, the different tools use different
entity declarations for patients.minOccurs and patients.maxOccurs.

(3) You declare that the syntactic rule in the language you are
defining is that 'patients' may occur optionally, and specify that
it is up to application-level checking to ensure that each
of the three applications you have described checks to see that
'patients' occurs, or does not occur, as prescribed.  (That is,
you kick the problem over to the business rule people and tell
them it's their problem not yours.)

(4) You enclose 'patients' in an enclosing element, indicating
which of the three rules the instance document is supposed to
be following at the moment.  So the sequence which now contains
deviceType and patients now reads instead:

    <xsd:sequence>
     <xsd:element name="device" type="deviceType"/>
     <xsd:choice>
      <xsd:element name="clinicalpatients">
       <xsd:complexType>
        <xsd:sequence>
	<xsd:element name="pateients" type="patientsType" minOccurs="0"/>
        </xsd:sequence>
       </xsd:complexType>
      </xsd:element>
      <xsd:element name="archivalpatients">
       <xsd:complexType>
        <xsd:sequence>
	<xsd:element name="pateients" type="patientsType" minOccurs="1"/>
        </xsd:sequence>
       </xsd:complexType>
      </xsd:element>
      <xsd:element name="anonymizedpatients">
       <xsd:complexType>
        <xsd:sequence/>
       </xsd:complexType>
      </xsd:element>
     </xsd:choice>
    </xsd:sequence>

The systems which transfer records from the clinical applications to
the archiving application, or to applications using anonymized data,
are responsible for changing the wrapper, which thus becomes a visible
signal that the record has been touched by the transfer application.
(This may be useful in debugging records transfer problems.)

(5) You get rid of the nesting and simply replace 'patients'
with three flavors of patients, all using the same type but
with different occurrence requirements.  Your sequence now becomes

    <xsd:sequence>
     <xsd:element name="device" type="deviceType"/>
     <xsd:choice>
      <xsd:element name="clinicalpatients" type="patientsType"  
minOccurs="0"/>
      <xsd:element name="archivalpatients" type="patientsType"  
minOccurs="1"/>
      <xsd:element name="anonymizedpatients">
       <xsd:complexType>
        <xsd:sequence/>
       </xsd:complexType>
      </xsd:element>
     </xsd:choice>
    </xsd:sequence>

Again the records transfer tools are responsible for changing the
name of the element in order to signal that they have done their work.

If you really want to document that 'clinicalpatients' and
'archivalpatients' and 'anonymizedpatients' are all really just
flavors of 'patients', by all means define an abstract 'patients'
element and make them all substitutable for it.

(6) You put an appropriate flag into the content model not as a
wrapper around 'patients' but as a preceding sibling:

    <xsd:sequence>
     <xsd:element name="device" type="deviceType"/>
     <xsd:choice>
      <xsd:sequence>
       <xsd:element name="clinical" type="our:flavor" minOccurs="1"/>
       <xsd:element name="patients" type="patientsType" minOccurs="0"/>
      </xsd:sequence>
      <xsd:sequence>
       <xsd:element name="archival" type="our:flavor" minOccurs="1"/>
       <xsd:element name="patients" type="patientsType" minOccurs="1"/>
      </xsd:sequence>
      <xsd:sequence>
       <xsd:element name="anonymized" type="our:flavor" minOccurs="1"/>
      </xsd:sequence>
     </xsd:choice>
    </xsd:sequence>

Which of these seems most appealing will depend on a lot of things,
including what it is you really want when you say you want a
conditional, and possibly including also what you think the other
tools you work with are going to be capable of doing.

I hope this helps.

Michael Sperberg-McQueen


-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Tuesday, 7 April 2009 04:08:48 UTC