RE: Serialization (sometimes) needs to include type information from Antoine Mensch on 2004-02-11 (public-qt-comments@w3.org from February 2004)

From: Antoine Mensch <antoine.mensch@xquarkgroup.com>
Date: Wed, 11 Feb 2004 22:20:27 +0100
To: "Michael Rys" <mrys@microsoft.com>, <public-qt-comments@w3.org>
Message-ID: <KBELJHFJHGPOHFENKHJCCEJACCAA.antoine.mensch@xquarkgroup.com>
> While I agree with Mike Kay's comments, let me just point out
> that using a construction will retype your C elements to the type
> required by the result element and will not preserve the original
> type annotation.
>

I've looked at the element constructor semantics: while I've always
considered a bit "strange" for a strongly-typed language to loose all type
information during a simple clone operation, I think the approach is OK as
long as you can easily retype the cloned nodes by simply defining the
appropriate XML schema components (possibly reusing components from the
source schemas). The whole point of my message is to point out an important
class of XQuery expressions (union-like XPath expressions, including the
very popular descendant-or-self step) for which such a retyping is very
difficult without proper support from the serialization, or perhaps I should
say XQuery processor, since you are right in pointing out that it is at the
level of the element constructor that the type information is lost.

Without such a support, executing even the simplest queries over SOAP would
be very difficult, because there will be no way the service could retrieve
and pass along to the user the type of the data returned by a query such as

	doc("myDocument")//C

as soon as there are more than one type of C elements in the document.

> So, if you want to get the original types in the data model
> instance that your query generates, you would need to write:
>
> 	for $x in doc("myDocument")//C
> 	return $x
>

And how can I use this result? If C elements contain both xs:string or
xs:date, how do I serialize this sequence of nodes without losing the type
information? I cannot define a content model for this sequence, so it's only
by looking at the individual nodes that I can retrieve the information.

Best regards,

Antoine Mensch

> -----Message d'origine-----
> De : Michael Rys [mailto:mrys@microsoft.com]
> Envoyé : mercredi 11 février 2004 20:11
> À : antoine.mensch@xquarkgroup.com; public-qt-comments@w3.org
> Objet : RE: Serialization (sometimes) needs to include type information
>
>
> While I agree with Mike Kay's comments, let me just point out
> that using a construction will retype your C elements to the type
> required by the result element and will not preserve the original
> type annotation.
>
> So, if you want to get the original types in the data model
> instance that your query generates, you would need to write:
>
> 	for $x in doc("myDocument")//C
> 	return $x
>
> As soon as you add <result> around it, you will get the content
> retyped (see the section on Typing and element construction in
> the language document at [1]).
>
> Best regards
> Michael
>
> [1] http://www.w3.org/TR/2003/WD-xquery-20031112/#id-type-of-constructed
>
>
> > -----Original Message-----
> > From: Antoine Mensch [mailto:antoine.mensch@xquarkgroup.com]
> > Sent: Wednesday, February 11, 2004 2:58 AM
> > To: public-qt-comments@w3.org; Michael Rys
> > Subject: RE: Serialization (sometimes) needs to include type information
> >
> > > Note that if you validate the result according to an in-scope
> > > schema component of an element result, then the elements inside
> > > will be typed according to that type and not the original type.
> > > So again, there is no type ambivalence and the schema of the
> > > result element can be easily provided in addition to the data
> generated.
> >
> > I will try to be more concrete:
> > I have two C elements of type ns1:Type1 and ns1:Type2 in a document. I
> > assume that all necessary in scope information is available and
> that I am
> > using validation mode.
> >
> > The element declaration for the result element in the query
> >
> > <result>
> > {
> > 	for $x in doc("myDocument")//C
> > 	return $x
> > }
> > </result>
> >
> > should be written (if I want to retain type information in the result
> > document) as:
> >
> > <xs:element name="result">
> > 	<xs:complexType>
> > 		<xs:choice maxOccurs="unbounded">
> > 			<xs:element name="C" type="ns1:Type1"/>
> > 			<xs:element name="C" type="ns1:Type2"/>
> > 		</xs:choice>
> > 	</xs:complexType>
> > </xs:element>
> >
> > Unfortunately, this is not a valid schema component. The only
> solution to
> > have a valid element declaration is to declare the result
> element C with a
> > common supertype of both ns1:Type1 and ns1:Type2 (which will be
> xs:anyType
> > in the general case).
> >
> > Therefore, in order to retain the original type information (which could
> > well be something as simple as knowing whether the C element
> content is a
> > date or a string, when ns1:Type1 and ns1:Type2 are respectively xs:date
> > and
> > xs:string), I need to annotate each occurrence of C in the
> output with the
> > original type information.
> >
> > Best regards,
> >
> > Antoine Mensch
> >
> > > -----Message d'origine-----
> > > De : Michael Rys [mailto:mrys@microsoft.com]
> > > Envoyé : mercredi 11 février 2004 11:28
> > > À : antoine.mensch@xquarkgroup.com; public-qt-comments@w3.org
> > > Objet : RE: Serialization (sometimes) needs to include type
> information
> > >
> > >
> > > Note that if you validate the result according to an in-scope
> > > schema component of an element result, then the elements inside
> > > will be typed according to that type and not the original type.
> > > So again, there is no type ambivalence and the schema of the
> > > result element can be easily provided in addition to the data
> generated.
> > >
> > > Best regards
> > > Michael
> > >
> > > > -----Original Message-----
> > > > From: public-qt-comments-request@w3.org [mailto:public-qt-comments-
> > > > request@w3.org] On Behalf Of Antoine Mensch
> > > > Sent: Wednesday, February 11, 2004 12:54 AM
> > > > To: public-qt-comments@w3.org
> > > > Subject: RE: Serialization (sometimes) needs to include type
> > information
> > > >
> > > >
> > > > >
> > > > > First, you need to give us some more information about
> the in-scope
> > > > > schema components and validation mode for your query.
> > > > >
> > > > I would like to have an in-scope schema component allowing me
> > > to validate
> > > > (so I assume strict or lax validation) the "result" element and
> > > construct
> > > > a
> > > > PSVI that will contain the same (or equivalent) type information as
> > the
> > > > source data. What I am trying to express is that I cannot write such
> > > > valid
> > > > complex type, due to restrictions in the schema specification
> > (elements
> > > > with
> > > > the same name must have the same type in a given content model).
> > > >
> > > > > Assuming that you imported the two elements below, have
> the document
> > > > > typed with the information and have lax validation mode, then your
> > > > > result would be an element result of type xdt:untyped since
> > > it could not
> > > > > find a definition in the schema components for the result
> > > element. This
> > > > > then also means that the C elements will be untyped and
> not preserve
> > > > > their original type.
> > > > >
> > > >
> > > > Yes, I understand that. The point is that I cannot write
> such a schema
> > > > component for the "result" element.
> > > >
> > > > > So your example does not convey the semantics that you assume it
> > does
> > > > > and does not require a type serialization.
> > > > >
> > > >
> > > > I hope the above clarifies why it does require type serialization.
> > > >
> > > > > Also note that XML is primarily late typed data: You have the
> > > > > self-describing XML document and you associate type information
> > after
> > > > > creation of a document. Thus, mandating the serialization of type
> > > > > information and thus making the document early typed
> seems contrary
> > to
> > > > > the general XML philosophy.
> > > > >
> > > >
> > > > It seems to me that the xsi:type attribute (which I think is
> > > not contrary
> > > > to
> > > > the general XML philosophy) has been introduced for exactly
> > > that purpose.
> > > >
> > > > In addition, consider the following extract from the Data Model
> > > spec (§4):
> > > > "Constructing an Infoset from an instance of the data model, for
> > example
> > > > in
> > > > order to perform schema validity assessment, is accomplished by
> > > > serializing
> > > > the document and parsing it. Implementations are not required
> > > to implement
> > > > this process literally, but they must obtain the same result as if
> > they
> > > > had."
> > > >
> > > > This is impossible if we cannot specify the types of each
> individual C
> > > > elements in the result (though xsi:type).
> > > >
> > > > Best regards,
> > > >
> > > > Antoine Mensch
> > > >
> > > > > > -----Original Message-----
> > > > > > From: public-qt-comments-request@w3.org [mailto:public-qt-
> > comments-
> > > > > > request@w3.org] On Behalf Of Antoine Mensch
> > > > > > Sent: Wednesday, February 11, 2004 12:23 AM
> > > > > > To: public-qt-comments@w3.org
> > > > > > Subject: Serialization (sometimes) needs to include type
> > information
> > > > > >
> > > > > >
> > > > > > Consider the following schema fragment:
> > > > > >
> > > > > > <xs:element name="A">
> > > > > > 	<xs:complexType>
> > > > > > 		<xs:sequence>
> > > > > > 			<xs:element name="C" type="myns:Type1"/>
> > > > > > 		</xs:sequence>
> > > > > > 	</xs:complexType>
> > > > > > </xs:element>
> > > > > >
> > > > > > <xs:element name="B">
> > > > > > 	<xs:complexType>
> > > > > > 		<xs:sequence>
> > > > > > 			<xs:element name="C" type="myns:Type2"/>
> > > > > > 		</xs:sequence>
> > > > > > 	</xs:complexType>
> > > > > > </xs:element>
> > > > > >
> > > > > > Now if we consider a document (or any other data source)
> > containing
> > > > > both A
> > > > > > and B elements, the following query
> > > > > >
> > > > > > <result>
> > > > > > { 	for $x in doc("myDocument")//C
> > > > > > 	return $x
> > > > > > }
> > > > > > </result>
> > > > > >
> > > > > > returns a result that cannot be strongly typed without
> losing type
> > > > > > information by any valid schema, as the schema spec forbids
> > elements
> > > > > with
> > > > > > the same name and a different type in the same content model.
> > > > > >
> > > > > > It seems to me that the only way of retaining type information
> > would
> > > > > be to
> > > > > > annotate produced C elements with xsi:type. This could be a
> > > > > serialization
> > > > > > parameter, similar to the cdata-section-elements. However,
> > > this would
> > > > > > raise
> > > > > > another issue, as anonymous type names would then be
> exposed, and
> > > > > would
> > > > > > thus
> > > > > > require to be handled in a consistent way by different
> > > XQuery and XML
> > > > > > Schema
> > > > > > processors.
> > > > > >
> > > > > > This issue is important, especially for tools that perform
> > > distributed
> > > > > > XQuery processing, and that need to retain consistent type
> > > information
> > > > > > when
> > > > > > moving XML data from one processing node to another.
> > > > > >
> > > > >
> > > > >
> > >
> > >
>
>
Received on Wednesday, 11 February 2004 16:14:51 UTC