Section 3.2.2.4 text on extensibilty from David Orchard on 2003-06-28 (www-tag@w3.org from June 2003)

From: David Orchard <dorchard@bea.com>
Date: Fri, 27 Jun 2003 17:02:57 -0700
To: <www-tag@w3.org>
Message-ID: <028b01c33d08$a2c862a0$7106a8c0@beasys.com>
Here's a rough start to the extensibility and versioning section of the web
arch document, and a small change proposed to 3.2.

3.2

The format specification should be designed for extensibility and
versioning.

3.2.2.4 Extensibility

XML and XML Namespaces are designed for creating vocabularies and combining
them together.  Extensibility is the term for combining multiple
vocabularies together, or allowing more than 1 vocabulary to be in scope in
a document.

Good practice: Languages should provide for extensibility

Now what is the relationship between versioning and extensibility?  A clear
relationship is where a schema may be extended to add/change/delete element
and attribute definitions.  We call this schema and instances of it a new
version of the language.  But what if a 3rd part adds it's vocabulary
elements in without changing the containing schema?  Then the containing
language has not evolved, but the document instance has.  This is a new
version of the message.  An example is a SOAP message with a header block.
We typically call the header block a SOAP extension and not a new version of
SOAP.  Any changes to the particular message would be considered a new
version of the message.

Versioning is the term for the evolution of languages and documents.
Versioning is achieved through extensibility mechanisms and language
redefinition.  There are 3 types of version changes that can occur.
Incompatible, backwards compatible and forwards compatible changes.

In the case of xml documents, backwards compatibility means that a new
version of an xml document can be deployed in such a manner as to not break
existing agents that process the xml document.  This means that a sending
agent can send an old version of an xml document to a receiving agent that
understands the new version and still have the message successfully
processed.  Forwards compatibility means that an older version of an
receiving agent can receiver newer documents and not break.  This means that
a sending agent can send a newer version of a document and still have the
message successfully processed.

Backwards compatibility means that existing sending agents can use receiving
agents that have been updated, and forwards compatibility means that newer
sending agents can continue to use existing recieving agents.

Forwards and backwards-compatible changes are typically the addition of an
optional element or attribute.  The cost of non-backward or non-forward
compatible changes are often very high, as all the software that uses the
language must be updated to the newer version.

Good practice: Languages should be created with an extensibility model that
permits forwards compatible and backwards compatible changes in the
language.

Forwards compatibility means that a receiver must be able to receive newer
content and process the message as if the newer content didn't exist.  This
newer content is considered optional, and the acting as if it didn't exist
is called "ignoring".   Language designers need to indicate that optional
content that that are not familiar with must be ignored.  The mechanism for
ignoring can have a few different flavours.  One flavour is to simply act as
if it doesn't exist, though care must be taken for positional based
behaviour.  Another mechanism is to replace the element tag with the
element's content.

Good practice: Languages should specify behaviour for unknown or
unrecognized content.  A common model is that such content must be ignored.

In cases where the newer content is required to be understood, or is
mandatory, the language designer may need to provide a mechanism for
indicating that the content must be understood.  New, mandatory content is
not a forwards compatible change.  One technique for indicating that new
content is required is to change the element names or the namespace names in
the message.  However, many languages are containers and are designed for
extensions.

Good practice: Languages that need to indicate mandatory extensions should
provide such a facility.

An example of this is the mustUnderstand attribute in SOAP.

XML and Schema languages require that schemas have deterministic content
models.  An explanation from the XML 1.0 specification, "For example, the
content model ((b, c) | (b, d)) is non-deterministic, because given an
initial b the XML processor cannot know which b in the model is being
matched without looking ahead to see which element follows the b."

Schema languages like W3C XML Schema provide a variety of extensibility
mechanisms, such as wildcards and type derivation.  The combination of
extensibility and determinism can make it difficult to create the optimal
schema.  As a simple example, in XML Schema, a wildcard that allows
extension in any namespace (<xs:any targetNamespace="##any"/>) cannot occur
after an element that does not have a minOccurs value equal to the maxOccurs
value.  If the min/max are different, the processor won't know whether an
instance of the element belongs to the element definition or the wildcard.
Another example is that a type definition that ends in a wildcard allowing
any namespace cannot be extended through derivation.

Principle: Languages must account for determinism in the types of
extensibility.

Cheers,
Dave
Received on Friday, 27 June 2003 20:03:51 UTC