W3C home > Mailing lists > Public > www-ql@w3.org > October to December 2002

RE: Things that one can define with XML schema but cannot query w ith XQuery?

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Thu, 31 Oct 2002 17:21:28 +0100
Message-ID: <DFF2AC9E3583D511A21F0008C7E621060453DD28@daemsg02.software-ag.de>
To: Marko Smiljanic <markosm@cs.utwente.nl>, www-ql@w3.org

> This is a question about XML Schema and XQuery
> <!-- Excuse me if this was already discussed, I had no time 
> to carefully read the whole history of this mailing list -->
> Lets take the part of some XML schema definition:
> <xs:element name="root">
>     <xs:complexType>
>         <xs:sequence maxOccurs="5">
>             <xs:element name="A"/>
>             <xs:element name="B" minOccurs="0"/>
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
> It defines that <root> can "contain" from 1 up to 5 sequences 
> of two elements <A/> and <B/>.  <B> does not have to exist.
> E.g. an XML instance confirming to XML schema above:
> <root>
>     <A>1</A>
>     <B>2</B>
>     <A>3</A>
>     <!-- non existing element B -->
>     <A>4</A>
>     <B>5</B>
> </root>
> We can assume that the XML schema designer had a specific 
> semantics in mind when he specified that a sequence <A/><B/> 
> should repeat it self. E.g. <A/> can be the name of a man and 
> <B/> can be his address. We thus have from 1 to 5 pairs of 
> person name / person address (where address does not have to 
> be specified). Note that each single person is represented by 
> one sequence. I can say that sequence has a clear semantics 
> i.e. 1 sequence = 1 person. (and each person has a name and 
> an address)
> My questions are:
> a) How can we specify a query in XQuery language saying that 
> I'm interested in 2nd persons address (i.e. <B>). (The answer 
> should be empty for the example above).

You are quite right that these structures are very difficult to query. These
come up quite frequently in XSLT, I usually refer to them as "positional
grouping" problems. There are two classes of solution, one involves a
recursive function processing the sequence of siblings, the other involves
treating it as a value-based grouping problem, using the ID of the start
element of a group as the grouping key. Unfortunately the second solution
relies heavily on the use of the generate-id() function and the sibling
axes, neither of which are available in XQuery.

The usual advice is that this is bad XML design. There is a level of
hierarchy that's missing from the markup, an important object in the data
model that isn't represented by an element in the XML. I would advise anyone
to add this missing level (it can be done easily using the grouping
facilities in XSLT 2.0) before storing the data in a database.

> b) This is similar to the problem in a): is there any way 
> that that I can count the number of A,B sequences in an 
> instance XML document (using XQuery). (The example above has 
> 3 sequences).
> If the answers to those questions are negative, then there is 
> a conflict between XML Schema and XQuery. Sequence, all and 
> choice are structures that can be define in XML Schema, but 
> are not visible in XML instance and might not be accessible 
> with XQuery. XML parser can surely count them, but can the 
> XQuery do the same.

I don't regard it as a conflict. Just because Schema provides constructs
that enable you to check that your data has a particular pattern doesn't
mean that Query should be able to locate the objects implied by that
pattern. If you want to use the structure in a query, add another level of
elements to make it explicit.

Michael Kay
Received on Thursday, 31 October 2002 11:21:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:17:15 UTC