RE: Serialization (sometimes) needs to include type information from Michael Rys on 2004-02-11 (public-qt-comments@w3.org from February 2004)

From: Michael Rys <mrys@microsoft.com>
Date: Wed, 11 Feb 2004 13:22:16 -0800
To: <antoine.mensch@xquarkgroup.com>, <public-qt-comments@w3.org>
Message-ID: <EB0A327048144442AFB15FCE18DC96C701FFC41F@RED-MSG-31.redmond.corp.microsoft.com>
First: We decided a long time ago in the WG, that type inference of the result is not a requirement, but a feature that some implementations may provide.

Second: If you provide that, I do not see a problem of inferring a type that defines a content model that accepts the result. In the worst case it is element(C, xs:anyType)...

Best regards
Michael

> -----Original Message-----
> From: Antoine Mensch [mailto:antoine.mensch@xquarkgroup.com]
> Sent: Wednesday, February 11, 2004 1:20 PM
> To: Michael Rys; public-qt-comments@w3.org
> Subject: RE: Serialization (sometimes) needs to include type information
> 
> > While I agree with Mike Kay's comments, let me just point out
> > that using a construction will retype your C elements to the type
> > required by the result element and will not preserve the original
> > type annotation.
> >
> 
> I've looked at the element constructor semantics: while I've always
> considered a bit "strange" for a strongly-typed language to loose all type
> information during a simple clone operation, I think the approach is OK as
> long as you can easily retype the cloned nodes by simply defining the
> appropriate XML schema components (possibly reusing components from the
> source schemas). The whole point of my message is to point out an
> important
> class of XQuery expressions (union-like XPath expressions, including the
> very popular descendant-or-self step) for which such a retyping is very
> difficult without proper support from the serialization, or perhaps I
> should
> say XQuery processor, since you are right in pointing out that it is at
> the
> level of the element constructor that the type information is lost.
> 
> Without such a support, executing even the simplest queries over SOAP
> would
> be very difficult, because there will be no way the service could retrieve
> and pass along to the user the type of the data returned by a query such
> as
> 
> 	doc("myDocument")//C
> 
> as soon as there are more than one type of C elements in the document.
> 
> > So, if you want to get the original types in the data model
> > instance that your query generates, you would need to write:
> >
> > 	for $x in doc("myDocument")//C
> > 	return $x
> >
> 
> And how can I use this result? If C elements contain both xs:string or
> xs:date, how do I serialize this sequence of nodes without losing the type
> information? I cannot define a content model for this sequence, so it's
> only
> by looking at the individual nodes that I can retrieve the information.
> 
> Best regards,
> 
> Antoine Mensch
> 
> > -----Message d'origine-----
> > De : Michael Rys [mailto:mrys@microsoft.com]
> > Envoyé : mercredi 11 février 2004 20:11
> > À : antoine.mensch@xquarkgroup.com; public-qt-comments@w3.org
> > Objet : RE: Serialization (sometimes) needs to include type information
> >
> >
> > While I agree with Mike Kay's comments, let me just point out
> > that using a construction will retype your C elements to the type
> > required by the result element and will not preserve the original
> > type annotation.
> >
> > So, if you want to get the original types in the data model
> > instance that your query generates, you would need to write:
> >
> > 	for $x in doc("myDocument")//C
> > 	return $x
> >
> > As soon as you add <result> around it, you will get the content
> > retyped (see the section on Typing and element construction in
> > the language document at [1]).
> >
> > Best regards
> > Michael
> >
> > [1] http://www.w3.org/TR/2003/WD-xquery-20031112/#id-type-of-constructed
> >
> >
> > > -----Original Message-----
> > > From: Antoine Mensch [mailto:antoine.mensch@xquarkgroup.com]
> > > Sent: Wednesday, February 11, 2004 2:58 AM
> > > To: public-qt-comments@w3.org; Michael Rys
> > > Subject: RE: Serialization (sometimes) needs to include type
> information
> > >
> > > > Note that if you validate the result according to an in-scope
> > > > schema component of an element result, then the elements inside
> > > > will be typed according to that type and not the original type.
> > > > So again, there is no type ambivalence and the schema of the
> > > > result element can be easily provided in addition to the data
> > generated.
> > >
> > > I will try to be more concrete:
> > > I have two C elements of type ns1:Type1 and ns1:Type2 in a document. I
> > > assume that all necessary in scope information is available and
> > that I am
> > > using validation mode.
> > >
> > > The element declaration for the result element in the query
> > >
> > > <result>
> > > {
> > > 	for $x in doc("myDocument")//C
> > > 	return $x
> > > }
> > > </result>
> > >
> > > should be written (if I want to retain type information in the result
> > > document) as:
> > >
> > > <xs:element name="result">
> > > 	<xs:complexType>
> > > 		<xs:choice maxOccurs="unbounded">
> > > 			<xs:element name="C" type="ns1:Type1"/>
> > > 			<xs:element name="C" type="ns1:Type2"/>
> > > 		</xs:choice>
> > > 	</xs:complexType>
> > > </xs:element>
> > >
> > > Unfortunately, this is not a valid schema component. The only
> > solution to
> > > have a valid element declaration is to declare the result
> > element C with a
> > > common supertype of both ns1:Type1 and ns1:Type2 (which will be
> > xs:anyType
> > > in the general case).
> > >
> > > Therefore, in order to retain the original type information (which
> could
> > > well be something as simple as knowing whether the C element
> > content is a
> > > date or a string, when ns1:Type1 and ns1:Type2 are respectively
> xs:date
> > > and
> > > xs:string), I need to annotate each occurrence of C in the
> > output with the
> > > original type information.
> > >
> > > Best regards,
> > >
> > > Antoine Mensch
> > >
> > > > -----Message d'origine-----
> > > > De : Michael Rys [mailto:mrys@microsoft.com]
> > > > Envoyé : mercredi 11 février 2004 11:28
> > > > À : antoine.mensch@xquarkgroup.com; public-qt-comments@w3.org
> > > > Objet : RE: Serialization (sometimes) needs to include type
> > information
> > > >
> > > >
> > > > Note that if you validate the result according to an in-scope
> > > > schema component of an element result, then the elements inside
> > > > will be typed according to that type and not the original type.
> > > > So again, there is no type ambivalence and the schema of the
> > > > result element can be easily provided in addition to the data
> > generated.
> > > >
> > > > Best regards
> > > > Michael
> > > >
> > > > > -----Original Message-----
> > > > > From: public-qt-comments-request@w3.org [mailto:public-qt-
> comments-
> > > > > request@w3.org] On Behalf Of Antoine Mensch
> > > > > Sent: Wednesday, February 11, 2004 12:54 AM
> > > > > To: public-qt-comments@w3.org
> > > > > Subject: RE: Serialization (sometimes) needs to include type
> > > information
> > > > >
> > > > >
> > > > > >
> > > > > > First, you need to give us some more information about
> > the in-scope
> > > > > > schema components and validation mode for your query.
> > > > > >
> > > > > I would like to have an in-scope schema component allowing me
> > > > to validate
> > > > > (so I assume strict or lax validation) the "result" element and
> > > > construct
> > > > > a
> > > > > PSVI that will contain the same (or equivalent) type information
> as
> > > the
> > > > > source data. What I am trying to express is that I cannot write
> such
> > > > > valid
> > > > > complex type, due to restrictions in the schema specification
> > > (elements
> > > > > with
> > > > > the same name must have the same type in a given content model).
> > > > >
> > > > > > Assuming that you imported the two elements below, have
> > the document
> > > > > > typed with the information and have lax validation mode, then
> your
> > > > > > result would be an element result of type xdt:untyped since
> > > > it could not
> > > > > > find a definition in the schema components for the result
> > > > element. This
> > > > > > then also means that the C elements will be untyped and
> > not preserve
> > > > > > their original type.
> > > > > >
> > > > >
> > > > > Yes, I understand that. The point is that I cannot write
> > such a schema
> > > > > component for the "result" element.
> > > > >
> > > > > > So your example does not convey the semantics that you assume it
> > > does
> > > > > > and does not require a type serialization.
> > > > > >
> > > > >
> > > > > I hope the above clarifies why it does require type serialization.
> > > > >
> > > > > > Also note that XML is primarily late typed data: You have the
> > > > > > self-describing XML document and you associate type information
> > > after
> > > > > > creation of a document. Thus, mandating the serialization of
> type
> > > > > > information and thus making the document early typed
> > seems contrary
> > > to
> > > > > > the general XML philosophy.
> > > > > >
> > > > >
> > > > > It seems to me that the xsi:type attribute (which I think is
> > > > not contrary
> > > > > to
> > > > > the general XML philosophy) has been introduced for exactly
> > > > that purpose.
> > > > >
> > > > > In addition, consider the following extract from the Data Model
> > > > spec (§4):
> > > > > "Constructing an Infoset from an instance of the data model, for
> > > example
> > > > > in
> > > > > order to perform schema validity assessment, is accomplished by
> > > > > serializing
> > > > > the document and parsing it. Implementations are not required
> > > > to implement
> > > > > this process literally, but they must obtain the same result as if
> > > they
> > > > > had."
> > > > >
> > > > > This is impossible if we cannot specify the types of each
> > individual C
> > > > > elements in the result (though xsi:type).
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Antoine Mensch
> > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: public-qt-comments-request@w3.org [mailto:public-qt-
> > > comments-
> > > > > > > request@w3.org] On Behalf Of Antoine Mensch
> > > > > > > Sent: Wednesday, February 11, 2004 12:23 AM
> > > > > > > To: public-qt-comments@w3.org
> > > > > > > Subject: Serialization (sometimes) needs to include type
> > > information
> > > > > > >
> > > > > > >
> > > > > > > Consider the following schema fragment:
> > > > > > >
> > > > > > > <xs:element name="A">
> > > > > > > 	<xs:complexType>
> > > > > > > 		<xs:sequence>
> > > > > > > 			<xs:element name="C" type="myns:Type1"/>
> > > > > > > 		</xs:sequence>
> > > > > > > 	</xs:complexType>
> > > > > > > </xs:element>
> > > > > > >
> > > > > > > <xs:element name="B">
> > > > > > > 	<xs:complexType>
> > > > > > > 		<xs:sequence>
> > > > > > > 			<xs:element name="C" type="myns:Type2"/>
> > > > > > > 		</xs:sequence>
> > > > > > > 	</xs:complexType>
> > > > > > > </xs:element>
> > > > > > >
> > > > > > > Now if we consider a document (or any other data source)
> > > containing
> > > > > > both A
> > > > > > > and B elements, the following query
> > > > > > >
> > > > > > > <result>
> > > > > > > { 	for $x in doc("myDocument")//C
> > > > > > > 	return $x
> > > > > > > }
> > > > > > > </result>
> > > > > > >
> > > > > > > returns a result that cannot be strongly typed without
> > losing type
> > > > > > > information by any valid schema, as the schema spec forbids
> > > elements
> > > > > > with
> > > > > > > the same name and a different type in the same content model.
> > > > > > >
> > > > > > > It seems to me that the only way of retaining type information
> > > would
> > > > > > be to
> > > > > > > annotate produced C elements with xsi:type. This could be a
> > > > > > serialization
> > > > > > > parameter, similar to the cdata-section-elements. However,
> > > > this would
> > > > > > > raise
> > > > > > > another issue, as anonymous type names would then be
> > exposed, and
> > > > > > would
> > > > > > > thus
> > > > > > > require to be handled in a consistent way by different
> > > > XQuery and XML
> > > > > > > Schema
> > > > > > > processors.
> > > > > > >
> > > > > > > This issue is important, especially for tools that perform
> > > > distributed
> > > > > > > XQuery processing, and that need to retain consistent type
> > > > information
> > > > > > > when
> > > > > > > moving XML data from one processing node to another.
> > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >
Received on Wednesday, 11 February 2004 16:22:24 UTC