- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Mon, 6 Apr 2009 22:08:09 -0600
- To: Dieter Menne <dieter.menne@menne-biomed.de>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, xmlschema-dev@w3.org
On 2 Apr 2009, at 12:05 , Dieter Menne wrote: > Hi, > > we are currently defining a format for medical data storage > (hrmconsensus.org). The full version is available > http://hrmconsensus.org/media/hrm/xhrm/xhrm02/xhrm0_2.xsd here . > > In the simplified example below, we have the always mandatory > deviceTyp. For > patientsType, we would like to have a global conditional switch so > that > three flavors are possible > > -- minOccurs = "0" for internal clinical use > -- minOccurs = "1" for archiving, must contain patient info > -- minOccurs = "never" anonymized, must not contain patient info I may be being dense, but it's not clear to me what your requirement is. Is it that (A) You want the internal clinical systems to use a schema with <xs:element name="patients" type="patientsType" minOccurs="0"/> while the archival system uses <xs:element name="patients" type="patientsType" minOccurs="1"/> while tools and data flows for anonymized data should use <xs:element name="patients" type="patientsType" maxOccurs="0"/> ? In other words, you want to work with three related but different schemas? Or is it that (B) based on some signal in the XML, the 'patients' element must occur, must not occur, or may occur? You don't seem to mention any visible signal in the XML, so I'm guessing it's not B. > I know that the latter is not possible, that conditionals are not > supported > in XSL, I'm not sure what you mean by that. There are many conditions one can check with the subset of regular languages which XSD uses for content models. It's true that to check conditions with a content model you may need to write the content model in a particular way. > and that Schematron would be an alternative. Note that the > conditionals occur in several nesting levels, so that we cannot easily > combine versions of a master element with details, but they are > always of > the type "may", "must", "must not". I'm not sure what you mean by this. > We would like to avoid having several xsd files and prefer a common > file > with branching. Is this (a) in order to avoid redundancy and eliminate the problem of inconsistent updates during maintenance of the schema document(s)? Or (b) because there are some important consumers of your work (maybe potential users, maybe your bosses, maybe ISO Pascal programmers) who might, you suspect, find it too hard to grasp the idea of a schema made up by consulting more than one file at schema construction time? Or (c) because you have no control over the schema processors to be used with this schema, and you do not believe that xsd:include is sufficiently interoperable to be relied upon? (d) Because you believe in your hearts that you are defining a single language here, and you want to make that fact manifest by producing a single schema document? (In this case, there is the troubling fact that the 'patients' element follows three different syntactic rules based not on syntactic context but based on application context, which suggests that formally speaking you really are defining not one language, but three.) (e) for some other reason? Any of these can be a plausible reason (so forgive me if my tone seems flippant or dismissive -- no offense to you intended), but what you need to do may vary a lot depending on which reason you have. > Any ideas or references to ideas are appreciated. Some possibilities that occur to me off the top of my head. (1) You single-source the schema document using a literate programming system (or a macro processor). So you have eliminated the inconsistent-maintenance problem. From your single source you generate three schema documents, called clinical.xsd, archival.xsd, and anonymized.xsd. The appropriate tools and systems use the appropriate schema document. The suggestions made by Michael Kay and Pete Cordell both fall into this category, I think. (2) A particular variant of the preceding. In the main schema document, the relevant declaration reads <xs:element name="patients" type="patientsType" minOccurs="&patients.minOccurs;" maxOccurs="&patients.maxOccurs;" /> And the document begins <!DOCTYPE xs:schema SYSTEM ... > By whatever means you choose, the different tools use different entity declarations for patients.minOccurs and patients.maxOccurs. (3) You declare that the syntactic rule in the language you are defining is that 'patients' may occur optionally, and specify that it is up to application-level checking to ensure that each of the three applications you have described checks to see that 'patients' occurs, or does not occur, as prescribed. (That is, you kick the problem over to the business rule people and tell them it's their problem not yours.) (4) You enclose 'patients' in an enclosing element, indicating which of the three rules the instance document is supposed to be following at the moment. So the sequence which now contains deviceType and patients now reads instead: <xsd:sequence> <xsd:element name="device" type="deviceType"/> <xsd:choice> <xsd:element name="clinicalpatients"> <xsd:complexType> <xsd:sequence> <xsd:element name="pateients" type="patientsType" minOccurs="0"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="archivalpatients"> <xsd:complexType> <xsd:sequence> <xsd:element name="pateients" type="patientsType" minOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="anonymizedpatients"> <xsd:complexType> <xsd:sequence/> </xsd:complexType> </xsd:element> </xsd:choice> </xsd:sequence> The systems which transfer records from the clinical applications to the archiving application, or to applications using anonymized data, are responsible for changing the wrapper, which thus becomes a visible signal that the record has been touched by the transfer application. (This may be useful in debugging records transfer problems.) (5) You get rid of the nesting and simply replace 'patients' with three flavors of patients, all using the same type but with different occurrence requirements. Your sequence now becomes <xsd:sequence> <xsd:element name="device" type="deviceType"/> <xsd:choice> <xsd:element name="clinicalpatients" type="patientsType" minOccurs="0"/> <xsd:element name="archivalpatients" type="patientsType" minOccurs="1"/> <xsd:element name="anonymizedpatients"> <xsd:complexType> <xsd:sequence/> </xsd:complexType> </xsd:element> </xsd:choice> </xsd:sequence> Again the records transfer tools are responsible for changing the name of the element in order to signal that they have done their work. If you really want to document that 'clinicalpatients' and 'archivalpatients' and 'anonymizedpatients' are all really just flavors of 'patients', by all means define an abstract 'patients' element and make them all substitutable for it. (6) You put an appropriate flag into the content model not as a wrapper around 'patients' but as a preceding sibling: <xsd:sequence> <xsd:element name="device" type="deviceType"/> <xsd:choice> <xsd:sequence> <xsd:element name="clinical" type="our:flavor" minOccurs="1"/> <xsd:element name="patients" type="patientsType" minOccurs="0"/> </xsd:sequence> <xsd:sequence> <xsd:element name="archival" type="our:flavor" minOccurs="1"/> <xsd:element name="patients" type="patientsType" minOccurs="1"/> </xsd:sequence> <xsd:sequence> <xsd:element name="anonymized" type="our:flavor" minOccurs="1"/> </xsd:sequence> </xsd:choice> </xsd:sequence> Which of these seems most appealing will depend on a lot of things, including what it is you really want when you say you want a conditional, and possibly including also what you think the other tools you work with are going to be capable of doing. I hope this helps. Michael Sperberg-McQueen -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Tuesday, 7 April 2009 04:08:48 UTC