RE: Serialization (sometimes) needs to include type information

While I agree with Mike Kay's comments, let me just point out that using a construction will retype your C elements to the type required by the result element and will not preserve the original type annotation.

So, if you want to get the original types in the data model instance that your query generates, you would need to write:

	for $x in doc("myDocument")//C
	return $x

As soon as you add <result> around it, you will get the content retyped (see the section on Typing and element construction in the language document at [1]).

Best regards
Michael

[1] http://www.w3.org/TR/2003/WD-xquery-20031112/#id-type-of-constructed


> -----Original Message-----
> From: Antoine Mensch [mailto:antoine.mensch@xquarkgroup.com]
> Sent: Wednesday, February 11, 2004 2:58 AM
> To: public-qt-comments@w3.org; Michael Rys
> Subject: RE: Serialization (sometimes) needs to include type information
> 
> > Note that if you validate the result according to an in-scope
> > schema component of an element result, then the elements inside
> > will be typed according to that type and not the original type.
> > So again, there is no type ambivalence and the schema of the
> > result element can be easily provided in addition to the data generated.
> 
> I will try to be more concrete:
> I have two C elements of type ns1:Type1 and ns1:Type2 in a document. I
> assume that all necessary in scope information is available and that I am
> using validation mode.
> 
> The element declaration for the result element in the query
> 
> <result>
> {
> 	for $x in doc("myDocument")//C
> 	return $x
> }
> </result>
> 
> should be written (if I want to retain type information in the result
> document) as:
> 
> <xs:element name="result">
> 	<xs:complexType>
> 		<xs:choice maxOccurs="unbounded">
> 			<xs:element name="C" type="ns1:Type1"/>
> 			<xs:element name="C" type="ns1:Type2"/>
> 		</xs:choice>
> 	</xs:complexType>
> </xs:element>
> 
> Unfortunately, this is not a valid schema component. The only solution to
> have a valid element declaration is to declare the result element C with a
> common supertype of both ns1:Type1 and ns1:Type2 (which will be xs:anyType
> in the general case).
> 
> Therefore, in order to retain the original type information (which could
> well be something as simple as knowing whether the C element content is a
> date or a string, when ns1:Type1 and ns1:Type2 are respectively xs:date
> and
> xs:string), I need to annotate each occurrence of C in the output with the
> original type information.
> 
> Best regards,
> 
> Antoine Mensch
> 
> > -----Message d'origine-----
> > De : Michael Rys [mailto:mrys@microsoft.com]
> > Envoyé : mercredi 11 février 2004 11:28
> > À : antoine.mensch@xquarkgroup.com; public-qt-comments@w3.org
> > Objet : RE: Serialization (sometimes) needs to include type information
> >
> >
> > Note that if you validate the result according to an in-scope
> > schema component of an element result, then the elements inside
> > will be typed according to that type and not the original type.
> > So again, there is no type ambivalence and the schema of the
> > result element can be easily provided in addition to the data generated.
> >
> > Best regards
> > Michael
> >
> > > -----Original Message-----
> > > From: public-qt-comments-request@w3.org [mailto:public-qt-comments-
> > > request@w3.org] On Behalf Of Antoine Mensch
> > > Sent: Wednesday, February 11, 2004 12:54 AM
> > > To: public-qt-comments@w3.org
> > > Subject: RE: Serialization (sometimes) needs to include type
> information
> > >
> > >
> > > >
> > > > First, you need to give us some more information about the in-scope
> > > > schema components and validation mode for your query.
> > > >
> > > I would like to have an in-scope schema component allowing me
> > to validate
> > > (so I assume strict or lax validation) the "result" element and
> > construct
> > > a
> > > PSVI that will contain the same (or equivalent) type information as
> the
> > > source data. What I am trying to express is that I cannot write such
> > > valid
> > > complex type, due to restrictions in the schema specification
> (elements
> > > with
> > > the same name must have the same type in a given content model).
> > >
> > > > Assuming that you imported the two elements below, have the document
> > > > typed with the information and have lax validation mode, then your
> > > > result would be an element result of type xdt:untyped since
> > it could not
> > > > find a definition in the schema components for the result
> > element. This
> > > > then also means that the C elements will be untyped and not preserve
> > > > their original type.
> > > >
> > >
> > > Yes, I understand that. The point is that I cannot write such a schema
> > > component for the "result" element.
> > >
> > > > So your example does not convey the semantics that you assume it
> does
> > > > and does not require a type serialization.
> > > >
> > >
> > > I hope the above clarifies why it does require type serialization.
> > >
> > > > Also note that XML is primarily late typed data: You have the
> > > > self-describing XML document and you associate type information
> after
> > > > creation of a document. Thus, mandating the serialization of type
> > > > information and thus making the document early typed seems contrary
> to
> > > > the general XML philosophy.
> > > >
> > >
> > > It seems to me that the xsi:type attribute (which I think is
> > not contrary
> > > to
> > > the general XML philosophy) has been introduced for exactly
> > that purpose.
> > >
> > > In addition, consider the following extract from the Data Model
> > spec (§4):
> > > "Constructing an Infoset from an instance of the data model, for
> example
> > > in
> > > order to perform schema validity assessment, is accomplished by
> > > serializing
> > > the document and parsing it. Implementations are not required
> > to implement
> > > this process literally, but they must obtain the same result as if
> they
> > > had."
> > >
> > > This is impossible if we cannot specify the types of each individual C
> > > elements in the result (though xsi:type).
> > >
> > > Best regards,
> > >
> > > Antoine Mensch
> > >
> > > > > -----Original Message-----
> > > > > From: public-qt-comments-request@w3.org [mailto:public-qt-
> comments-
> > > > > request@w3.org] On Behalf Of Antoine Mensch
> > > > > Sent: Wednesday, February 11, 2004 12:23 AM
> > > > > To: public-qt-comments@w3.org
> > > > > Subject: Serialization (sometimes) needs to include type
> information
> > > > >
> > > > >
> > > > > Consider the following schema fragment:
> > > > >
> > > > > <xs:element name="A">
> > > > > 	<xs:complexType>
> > > > > 		<xs:sequence>
> > > > > 			<xs:element name="C" type="myns:Type1"/>
> > > > > 		</xs:sequence>
> > > > > 	</xs:complexType>
> > > > > </xs:element>
> > > > >
> > > > > <xs:element name="B">
> > > > > 	<xs:complexType>
> > > > > 		<xs:sequence>
> > > > > 			<xs:element name="C" type="myns:Type2"/>
> > > > > 		</xs:sequence>
> > > > > 	</xs:complexType>
> > > > > </xs:element>
> > > > >
> > > > > Now if we consider a document (or any other data source)
> containing
> > > > both A
> > > > > and B elements, the following query
> > > > >
> > > > > <result>
> > > > > { 	for $x in doc("myDocument")//C
> > > > > 	return $x
> > > > > }
> > > > > </result>
> > > > >
> > > > > returns a result that cannot be strongly typed without losing type
> > > > > information by any valid schema, as the schema spec forbids
> elements
> > > > with
> > > > > the same name and a different type in the same content model.
> > > > >
> > > > > It seems to me that the only way of retaining type information
> would
> > > > be to
> > > > > annotate produced C elements with xsi:type. This could be a
> > > > serialization
> > > > > parameter, similar to the cdata-section-elements. However,
> > this would
> > > > > raise
> > > > > another issue, as anonymous type names would then be exposed, and
> > > > would
> > > > > thus
> > > > > require to be handled in a consistent way by different
> > XQuery and XML
> > > > > Schema
> > > > > processors.
> > > > >
> > > > > This issue is important, especially for tools that perform
> > distributed
> > > > > XQuery processing, and that need to retain consistent type
> > information
> > > > > when
> > > > > moving XML data from one processing node to another.
> > > > >
> > > >
> > > >
> >
> >

Received on Wednesday, 11 February 2004 14:11:35 UTC