[Bug 4657] Restrict the use of deref in sml:field/@xpath for SML implementation built on top of relational databases from bugzilla@wiggum.w3.org on 2007-10-03 (public-sml@w3.org from October 2007)

From: <bugzilla@wiggum.w3.org>
Date: Wed, 03 Oct 2007 06:04:07 +0000
To: public-sml@w3.org
CC:
Message-Id: <E1IcxL5-0005yU-8A@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4657


kumarp@microsoft.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |hasProposal




------- Comment #3 from kumarp@microsoft.com  2007-10-03 06:04 -------
Proposal:
Disallow the use of deref in sml:field/@xpath for SML implementation built on
top of relational databases.

Reasons:
[1]
The deref() calls can be nested to any level in the selector or field. Consider
the following selector xpath.

deref(deref(deref(a)/b)/c)/d/e

To evaluate this identity constraint graph, we need to perform a 3 level
recursion using a recursive CTE as the first step. Assuming a 10 ref fan-out at
each level, it gives us 1000 nodes at the leaf level. This can be done in a
relatively straightforward way using a single CTE call. For each of the
documents in the target document set thus obtained, we need to apply the xpath
�/d/e� to get the target node set.

Suppose we have to further evaluate a �field� xpath such as the one shown
below.

deref(f)/g

Here, for each node in the target node set, we need to make a CTE call.
Remember that the deref() can be nested to any level. To make matters worse, we
may have a case where there are 2 (or more) field xpaths. One without deref()
and one with a single deref() (and one with 2 level deref() and so on...). This
requires us to combine the result set of the first CTE with that of each of the
further CTEs. This is an extremely inefficient operation to perform in a large
store.

If we can simply shift the �deref(f)� into the selector xpath (for example,
deref(deref(deref(deref(a)/b)/c)/d/e)/f)/g) then we can compute the same result
set in just one CTE.

There is a huge difference in the performance in the 2 cases.

[2]
Allowing deref() only in the selector support almost all practical cases. 

[3]
If we decide to add this support later it will be a non-breaking change. On the
other hand, if we allow deref() in field now and decide to remove that support
in later versions of SML, it will be a breaking change. Breaking changes are
nearly impossible to make once a standard is adopted.

Received on Wednesday, 3 October 2007 06:04:14 UTC