RE: Serialization (sometimes) needs to include type information from Antoine Mensch on 2004-02-11 (public-qt-comments@w3.org from February 2004)

From: Antoine Mensch <antoine.mensch@xquarkgroup.com>
Date: Wed, 11 Feb 2004 11:57:55 +0100
To: <public-qt-comments@w3.org>, "Michael Rys" <mrys@microsoft.com>
Message-ID: <KBELJHFJHGPOHFENKHJCGEILCCAA.antoine.mensch@xquarkgroup.com>
> Note that if you validate the result according to an in-scope
> schema component of an element result, then the elements inside
> will be typed according to that type and not the original type.
> So again, there is no type ambivalence and the schema of the
> result element can be easily provided in addition to the data generated.

I will try to be more concrete:
I have two C elements of type ns1:Type1 and ns1:Type2 in a document. I
assume that all necessary in scope information is available and that I am
using validation mode.

The element declaration for the result element in the query

<result>
{
	for $x in doc("myDocument")//C
	return $x
}
</result>

should be written (if I want to retain type information in the result
document) as:

<xs:element name="result">
	<xs:complexType>
		<xs:choice maxOccurs="unbounded">
			<xs:element name="C" type="ns1:Type1"/>
			<xs:element name="C" type="ns1:Type2"/>
		</xs:choice>
	</xs:complexType>
</xs:element>

Unfortunately, this is not a valid schema component. The only solution to
have a valid element declaration is to declare the result element C with a
common supertype of both ns1:Type1 and ns1:Type2 (which will be xs:anyType
in the general case).

Therefore, in order to retain the original type information (which could
well be something as simple as knowing whether the C element content is a
date or a string, when ns1:Type1 and ns1:Type2 are respectively xs:date and
xs:string), I need to annotate each occurrence of C in the output with the
original type information.

Best regards,

Antoine Mensch

> -----Message d'origine-----
> De : Michael Rys [mailto:mrys@microsoft.com]
> Envoyé : mercredi 11 février 2004 11:28
> À : antoine.mensch@xquarkgroup.com; public-qt-comments@w3.org
> Objet : RE: Serialization (sometimes) needs to include type information
>
>
> Note that if you validate the result according to an in-scope
> schema component of an element result, then the elements inside
> will be typed according to that type and not the original type.
> So again, there is no type ambivalence and the schema of the
> result element can be easily provided in addition to the data generated.
>
> Best regards
> Michael
>
> > -----Original Message-----
> > From: public-qt-comments-request@w3.org [mailto:public-qt-comments-
> > request@w3.org] On Behalf Of Antoine Mensch
> > Sent: Wednesday, February 11, 2004 12:54 AM
> > To: public-qt-comments@w3.org
> > Subject: RE: Serialization (sometimes) needs to include type information
> >
> >
> > >
> > > First, you need to give us some more information about the in-scope
> > > schema components and validation mode for your query.
> > >
> > I would like to have an in-scope schema component allowing me
> to validate
> > (so I assume strict or lax validation) the "result" element and
> construct
> > a
> > PSVI that will contain the same (or equivalent) type information as the
> > source data. What I am trying to express is that I cannot write such
> > valid
> > complex type, due to restrictions in the schema specification (elements
> > with
> > the same name must have the same type in a given content model).
> >
> > > Assuming that you imported the two elements below, have the document
> > > typed with the information and have lax validation mode, then your
> > > result would be an element result of type xdt:untyped since
> it could not
> > > find a definition in the schema components for the result
> element. This
> > > then also means that the C elements will be untyped and not preserve
> > > their original type.
> > >
> >
> > Yes, I understand that. The point is that I cannot write such a schema
> > component for the "result" element.
> >
> > > So your example does not convey the semantics that you assume it does
> > > and does not require a type serialization.
> > >
> >
> > I hope the above clarifies why it does require type serialization.
> >
> > > Also note that XML is primarily late typed data: You have the
> > > self-describing XML document and you associate type information after
> > > creation of a document. Thus, mandating the serialization of type
> > > information and thus making the document early typed seems contrary to
> > > the general XML philosophy.
> > >
> >
> > It seems to me that the xsi:type attribute (which I think is
> not contrary
> > to
> > the general XML philosophy) has been introduced for exactly
> that purpose.
> >
> > In addition, consider the following extract from the Data Model
> spec (§4):
> > "Constructing an Infoset from an instance of the data model, for example
> > in
> > order to perform schema validity assessment, is accomplished by
> > serializing
> > the document and parsing it. Implementations are not required
> to implement
> > this process literally, but they must obtain the same result as if they
> > had."
> >
> > This is impossible if we cannot specify the types of each individual C
> > elements in the result (though xsi:type).
> >
> > Best regards,
> >
> > Antoine Mensch
> >
> > > > -----Original Message-----
> > > > From: public-qt-comments-request@w3.org [mailto:public-qt-comments-
> > > > request@w3.org] On Behalf Of Antoine Mensch
> > > > Sent: Wednesday, February 11, 2004 12:23 AM
> > > > To: public-qt-comments@w3.org
> > > > Subject: Serialization (sometimes) needs to include type information
> > > >
> > > >
> > > > Consider the following schema fragment:
> > > >
> > > > <xs:element name="A">
> > > > 	<xs:complexType>
> > > > 		<xs:sequence>
> > > > 			<xs:element name="C" type="myns:Type1"/>
> > > > 		</xs:sequence>
> > > > 	</xs:complexType>
> > > > </xs:element>
> > > >
> > > > <xs:element name="B">
> > > > 	<xs:complexType>
> > > > 		<xs:sequence>
> > > > 			<xs:element name="C" type="myns:Type2"/>
> > > > 		</xs:sequence>
> > > > 	</xs:complexType>
> > > > </xs:element>
> > > >
> > > > Now if we consider a document (or any other data source) containing
> > > both A
> > > > and B elements, the following query
> > > >
> > > > <result>
> > > > { 	for $x in doc("myDocument")//C
> > > > 	return $x
> > > > }
> > > > </result>
> > > >
> > > > returns a result that cannot be strongly typed without losing type
> > > > information by any valid schema, as the schema spec forbids elements
> > > with
> > > > the same name and a different type in the same content model.
> > > >
> > > > It seems to me that the only way of retaining type information would
> > > be to
> > > > annotate produced C elements with xsi:type. This could be a
> > > serialization
> > > > parameter, similar to the cdata-section-elements. However,
> this would
> > > > raise
> > > > another issue, as anonymous type names would then be exposed, and
> > > would
> > > > thus
> > > > require to be handled in a consistent way by different
> XQuery and XML
> > > > Schema
> > > > processors.
> > > >
> > > > This issue is important, especially for tools that perform
> distributed
> > > > XQuery processing, and that need to retain consistent type
> information
> > > > when
> > > > moving XML data from one processing node to another.
> > > >
> > >
> > >
>
>
Received on Wednesday, 11 February 2004 05:52:32 UTC