Using Schema for an Information Structure Model - XPath with no instance document from Peter Geraghty on 2005-04-13 (www-xml-schema-comments@w3.org from April to June 2005)

From: Peter Geraghty <Peter.Geraghty@tracegroup.com>
Date: 12 Apr 2005 18:48:32 -0600
To: <www-xml-schema-comments@w3.org>
Message-ID: <230ADE2B739E084084DE9D4993C4D6D3DBDB58@plc-exchange.tracegroup.com>

Hi.

Several years ago I started looking at Schema with the assumptions that:
1.	Schema was a standard for defining an information structure
model.
2.	The syntax for textual representation of information was
implicit in the use of XML.
3.	XPath expressions could serve as a means of identifying parts of
the information model.

I probably made these assumptions because these are three of the basic
requirements for the A2A and B2B messaging I work with.

However I am increasingly concerned that the intention of Schema is only
to be a standard for defining validation criteria, not information
structure.  Is this true?  In the absence of any other standard, people
are using it as a definition of an information model, but it seems not
to be intended for that.  I explain my concerns below.

For information interchange, the activity of producing a new XML
document is equally as important as the activity of processing and
validating an existing document.  I am involved in transformation tools
which allow business analysts to specify correspondences between
messaging standards in order to automate document production.  It is not
generally acceptable or useful to introduce any dependencies on the
output document order into these rules. For one thing, document order is
essentially a matter of syntax and irrelevant to the business analyst,
and for another there is no document in existence at the time the rules
are defined.

The UPA considers whether a Schema model is usable for validation of an
existing textual document being processed, and therefore allows anything
earlier in the document as available for use in identifying the relevant
particle.  What is missing (for me) is a similar constraint which says
that a particle can be uniquely attributed to any "simple" XPath
expression evaluated in a given schema context (I can elaborate the idea
of "simple" if anyone is interested).

Am I alone in this line of thought that with a "well written" Schema the
combination of a context-definition and an XPath expression yields the
definition of the addressee, even when no instance document is yet in
existence?  The approach works well in practice, since most schemas are
"well written", and we have our own ways of dealing with schemas which
are not, but I am concerned that it is not formally supported by schema
or XPath standards.

I raised a similar issue (in connection with static typing) with the
XPath 2.0 comments a year ago but without any response.

It seems strange that so much of the standards emphasis is on reacting
to instance documents being pushed at a receiving application, and so
little on facilitating the creation of valid instance documents within
the sending application.

I think there are many examples of valid Schema which don't work as
definitions of an information structure model because there is no usable
means of referring to items within it.

E.g., the following sequence of B (optional),C,B.

	<xs:element name="A">
		<xs:complexType>
			<xs:sequence>
				<xs:element name="B" minOccurs="0">
					<xs:annotation>
						<xs:documentation>This
means one thing</xs:documentation>
					</xs:annotation>
				</xs:element>
				<xs:element name="C"/>
				<xs:element name="B">
					<xs:annotation>
						<xs:documentation>This
means something else</xs:documentation>
					</xs:annotation>
				</xs:element>
			</xs:sequence>
		</xs:complexType>
	</xs:element>

As far as I can see, there is no usable standard which intentionally
allows this second B item to be identified for use in some kind of
correspondence map or other transformational definition.  There is no
simple XPath expression which you know in advance will always identify
the second B definition in this model.

Most people would agree that the above example is bad practice, even if
not invalid.  However, there are similar issues with restrictions of
complex content.  Schema allows the restriction to have a very different
expression from the base type.  This does not reflect the fact that the
meaning of each item in the restriction must be the same as the meaning
of some item in the base.

Pete

Attachments

text/plain attachment: InterScan_Disclaimer.txt

Received on Wednesday, 13 April 2005 00:49:06 UTC