Define SML-IF Schema Binding

Sandy Gao

Valentina Popescu

 

1.     Terminology

2.     Problem definition

3.     Requirements

3.1    Support schema composition

3.2    Support schema versioning

3.3    Deterministic

3.4    Full schema support

4.     Constraints

4.1    Support access to schema documents outside of SML-IF

4.2    Ignorable schema locations

4.3    Include definition and instance documents as-is

4.4    Lazy schema assembly

5.     Acknowledgement

 

1.   Terminology

Schema document:      an <xs:schema> element; can be an XML fragment

Schema:                       a set of schema components; a schema is normally (but not required to be) constructed from one or more schema documents

Schema component:    an element declaration or a type definition or a particle or …

Include:                        A schema document can include another schema document using <xs:include>.  Both schema documents contribute to the same schema; and both correspond to schema components from the same target namespace (or no namespace).  If the included schema document does not have a target namespace, namespace of the including schema document is used.

Redefine:                      Similar to include, but use <xs:redefine>, and the redefining schema document can replace certain included components with new components.

Import:                         Allows the importing schema document to refer to components from the imported namespace (or no namespace), which must be different from the importing schema document’s target namespace.  If the combination of the “namespace” attribute and the “schemaLocation” attribute on <xs:import> resolves to a schema document, then the resulting schema also includes components from the imported schema document.

Schema composition:  (In this document) construct a single schema from multiple schema documents, using the above include, redefine and/or import mechanisms.

Note: “a schema” is not equal to “a schema document”!

2.   Problem definition

In validating an SML-IF instance, associations between XML Schema definition documents and instance documents need to be drawn, both to completely validate XML Schema documents themselves (to make sure they produce valid schemas) and to establish schema-validity of the instance documents.

Schema documents can be connected with other schema documents using composition features provided by XML Schema.  This includes <xs:include>, <xs:redefine>, and <xs:import>.  A schema document’s validity may depend on other schema documents it includes/redefines/imports, or even other schema documents that include/redefine/import it.

When validating an instance document, a precise list of schema documents need to be associated with it for a “schema” and the instance document is schema-assessed using this schema.

The XML Schema 1.0 specification provides more flexibility in constructing the schema used for assessment than is appropriate for the semantics defined by SML and SML-IF validation:

·         It allows processor latitude in terms of locating schema documents (resolving namespace and schema location attributes) and composing schema documents together to form a single schema.

·         Schema location attributes can be ignored in some cases (“xsi:schemaLocation” in instance documents and “schemaLocation” on <xs:import>); and allowed to “fail to resolve” in others (“schemaLocation” attribute on <xs:include> and <import>).  Known schema and SML implementations behave differently with respect to how/whether they process schema location attributes.

·         Multiple imports of the same namespace allow all but the first one to be ignored.

So it is clear that we have no hope of guaranteeing general case interoperability using anything based only on XML Schema given the constraints above, and SML-IF needs to specify how to determine such associations.

3.   Requirements

3.1     Support schema composition

There are many real-life schemas that are constructed from multiple schema documents.  Such schemas may span multiple namespaces (hence the need for import); components from each namespace may be further divided into multiple schema documents (hence the need for include).

Schema has a feature often referred to as “chameleon include”.  This means that a schema document with a target namespace includes or redefines another schema document without a target namespace, and the result is as if the included/redefined document had a target namespace that’s the same as the including/redefining document.  SML-IF needs to support this usage scenario.

3.2     Support schema versioning

Schema authors can’t anticipate how their schemas will be used, hence the need to evolve schemas.  There are different versioning scenarios.  There are cases where minor modifications of older versions suffice, and redefine can be used.  Some schemas need to be rewritten to accommodate new requirements, and new namespace may or may not be introduced (compatibility is often a good reason for not changing namespaces).  There are also cases where there are generic and specific versions (as opposed to previous and next versions), which often co-exist and share the same namespace.

To support this, SML-IF needs to be able to package in the same SML-IF instance different versions of the same schema in the same namespace.

3.3     Deterministic

For a given SML-IF instance, there MUST be no ambiguity in determining how schema documents (that are included in this instance) are connected using <xs:include>, <xs:redefine>, and <xs:import>, and therefore MUST be no ambiguity in determining which schema documents are used to form a schema against which a given instance document is validated.

3.4     Full schema support

Being a generic validation language, SML supports all schema features.  Being a mechanism to transmit SML models, SML-IF also needs to support full schema features, especially <xs:include>, <xs:redefine>, and <xs:import>.  For example, in an SML model, if an instance document I is validated against a schema formed from a schema document A, which redefines schema document B, then it MUST be possible to transmit I, A, and B in an SML-IF instance and maintain their relationship.

4.   Constraints

4.1     Support access to schema documents outside of SML-IF

We do not want to force all schemas necessary to validate the model instance documents packaged by a single SML-IF instance to be included by value in every SML-IF instance.  It is not clear this would even be sensible in a repository interchange scenario, let alone the more general case of usage scenarios some have mentioned for SML-IF like web services message exchanges.

4.2     Ignorable schema locations

We cannot require honoring of xsi:schemaLocation and xsi:noNamespaceSchemaLocation in instance documents or schemaLocation on <xs:import>, because

·         Some existing implementations ignore them

·         Honoring schema location in instance document may leave security consequences

4.3     Include definition and instance documents as-is

SML-IF instance producers may not have control over the content of the schemas necessary for validation of model instance documents, where “control” means what is coded in the files.  I.e. there will be cases where xs:import and xs:include are coded, with and without schemaLocation, and multiple files containing schema components for the same namespace will be observed.

4.4     Lazy schema assembly

Schema specification allows schemas to be assembled lazily.  A partial schema can be used to validate an instance document, and more components can be added to the schema during the validation, as long as the new components don’t change validation result of information items that are already validated.

This is sometimes not easy to enforce, but a consequence of “supporting full schema” implies that SML-IF validation cannot violate this constraint.

5.   Acknowledgement

John Arwe, Bassam Tabarra, Harm Sluiman, and Pratul Dublish all provided useful input into the formulation of this document.