grouping by closest ancestor from Sergio Andreozzi on 2005-12-07 (www-ql@w3.org from October to December 2005)

From: Sergio Andreozzi <sergio.andreozzi@cnaf.infn.it>
Date: Wed, 07 Dec 2005 17:39:49 +0100
To: www-ql@w3.org
Message-ID: <43971055.3000705@cnaf.infn.it>

Dear all,

I'm facing a problem in XQuery/XPath. I would like to ask you for some
advice. Please, consider the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<S>
	<A>
		<B>
			<F>4</F>
			<C>
				<D>3</D>
				<E>5</E>
				<G>
					<H>3</H>
				</G>
				<G>
					<H>2</H>
				</G>
			</C>
		</B>
		<B>
			<F>2</F>
			<C>		
				<E>2</E>
				<G>
					<H>5</H>
				</G>
				<G>
					<H>4</H>
				</G>
			</C>
			<C>
				<D>7</D>
				<E>2</E>
			</C>
		</B>
	</A>
	<A>
		<B>
			<F>3</F>
			<C>
				<D>3</D>
				<E>2</E>
			</C>
			<C>
				<D>5</D>
				<E>2</E>
			</C>
		</B>
	</A>
	<A>
		<B>
			<F>2</F>
			<C>
				<D>5</D>
				<E>2</E>
			</C>
		</B>
	</A>
</S>

given a for clause like this: for $A in doc ("doc.xml")/S/A
as an example, I would like to work on the result of the following XPath
expressions:

$A/B/F ...
$A/B/C/D ...
$A/B/C/E ...
$A/B/C/G/H ...

but I need to have the results reorganized in such a way that I can
create groups based on the closest common ancestor (per each pair of
elements) as follows:

first A element, the XPath queries return:

$A/B/F     = (4,2)
$A/B/C/D   = (3,7)
$A/B/C/E   = (5,2,2)
$A/B/C/G/H = (3,2,5,4)

a possible refatctoring of the result is ( _ can be replaced with 0 ):

$A/B/F     = (4,4,2,2,2)
$A/B/C/D   = (3,3,_,_,7)
$A/B/C/E   = (5,5,5,5,2)
$A/B/C/G/H = (3,2,5,4,_)

In this result, considering each column, every pair of values refers to
the elements which have the closest ancestor (among all the elements 
with the same QName part of the XPath sequence result). For instance:

col 1, row 1: F=4
col 1, row 4: H=3

they have the first B element as closest ancestor. Conversely, in the 
generated result, I don't have H=3 coupled with F=2 as they have A as 
closest ancestor. A is farer than the common ancestor between H=3 and 
F=4 (the first B).

The idea is to generate all the possible tuples (read by column in my
proposal), one value for each element, but valid tuples are those which
have the closest ancestor (considering elements in pair). Missing values
can be filled with 0.

In the remaining part you can find refactoring of the result for the 
other A elements:

first A element, the XPath queries return:

$A/B/F   = (3)
$A/B/C/D = (3,5)
$A/B/C/E = (2,2)

I would like something like this:

$A/B/F   = (3,3)
$A/B/C/D = (3,5)
$A/B/C/E = (2,2)

first A element, the XPath queries return:

$A/B/F   = (2)
$A/B/C/D = (5)
$A/B/C/E = (2)

that's fine.

I would appreciate any suggestion or guidelines on how to generate the
results constrained as explained above. I hope that the problem
description is enough clear, otherwise I can provide more details.

Regards, Sergio

Received on Wednesday, 7 December 2005 16:41:55 UTC