Comment to the point order rules from Dmitry Lenkov on 2002-10-02 (www-dom@w3.org from October to December 2002)

From: Dmitry Lenkov <Dmitry.Lenkov@oracle.com>
Date: Wed, 2 Oct 2002 11:11:44 -0700
To: <www-dom@w3.org>
Cc: "Benjamin C. Chang" <Ben.Chang@oracle.com>, <dmitry.lenkov@oracle.com>
Message-ID: <00ab01c26a3f$2adee5d0$e7b52382@us.oracle.com>

Comment to the point order rules in 2.5 of http://www.w3.org/TR/2000/REC-DOM-Level-2-Traversal-Range-20001113/ranges.html
The point order rules have two errors and two missing rules. The reason for this seems to be a result of the confusion between XPath based rules where XPath expressions point to nodes and point based rules where points point between nodes. Let's consider the sample document borrowed from the recent discussion with Steve DeRose.

<p>hello, <emph>big </emph>world.</p>

This document corresponds to the diagram below, with the two element nodes, p and emph, and three text nodes, text1, text2, and text3:
p
|
-----------------------------------------
| | |
txt1 emph txt3
----------- ------- -----------
h e l l o , _ | w o r l d .
txt2
-------
b i g _

And let's consider several examples based on the syntax that is just for the illustration purposes:

1) A = (p.2) and B = (txt2.2). A points between nodes emph and Text3. B points between characters 'i' and 'g' in the TEXT node txt2.
According to the case 2 in the rules in 2.5, A is before B which is obviously wrong.

2) Let's switch A and B, i.e. A = (txt2.2) and B = (p.2). According to the case 3 in te same section, A is after B which is again wrong.

This means that cases 2 and 3 require corrections. Let's now consider the note given in 2.5 right after the point order rules. This note kind of makes the rest
of the spec rather useless since it does not specify precisely when the rules are not correct and since almost all operations on ranges rely on the order relationship between start and end points of ranges. However, there are only few cases where this note's first sentence holds true. These cases are associated with TEXT and CDATA nodes. They can be enumerated and resolved by adding more equality rules. Let's consider the following exaples:

3) A = (p.0) and B = (txt1,0). Obviously, A and B should be equal since they point to the same position in the document right after tag <p> and before the word hello,

4) Similarly, (p.1) is equal to (txt1.7)

The following corrections to the point order rules are suggested where COR_3 and COR_4 completely replace the note.

--------------------------------
Let A and B be two boundary-points or positions. Then one of the following holds: A is before B, A is equal to B, or A is after B. Which
one holds is specified in the following by examining four cases:

COR_0: -> Which one holds is specified as follows:

In the first case the boundary-points have the same container. A is before B if its offset is less than the offset of B, A is equal to B if its offset is equal to the offset of B, and A is after B if its offset is greater than the offset of B.

In the second case a child node C of the container of A is an ancestor container of B. In this case, A is before B if the offset of A is less than or equal to the index of the child node C and A is after B otherwise.

COR_1 -> A is before B if the offset of A is less (but not equal to) than the index of the child node C and A is after B otherwise.

In the third case a child node C of the container of B is an ancestor container of A. In this case, A is before B if the index of the child node C is less than the offset of B and A is after B otherwise.

COR_2 -> A is before B if the index of the child node C is less than OR EQUAL TO the offset of B and A is after B otherwise.

In the fourth case, none of three other cases hold: the containers of A and B are siblings or descendants of sibling nodes. In this case, A is before B if the container of A is before the container of B in a pre-order traversal of the Ranges' context tree and A is after B otherwise.

ADD_1 -> Or (equivalent): in a pre-order traversal of the subtree of Ranges' context tree, where the root of the subtree is the deepest common ancestor of A and B, and A is after B otherwise (this addition is not absolutely necesarry but usefull for the clarification of the computational base

Note that because the same location in a text representation of the document can correspond to two different positions in the DOM tree, it is possible for two boundary-points to not compare equal even though they would be equal in the text representation. For this reason, the informal definition above can sometimes be incorrect.

COR_3 -> Any point A whose container node is a TEXT node or a CDATA node and whose offset is 0, is equal to the point B whose container node is the parent node of the container node of the point A, and whose offset is the index of the container node of the point A as a child of the container node of the point B, minus 1 (ONE).

COR_4 -> Any point A whose container node is a TEXT node or a CDATA node and whose offset is N where N is the number of symbols in the TEXT or CDATA data, is equal to the point B whose container node is the parent node of the container node of the point A, and whose offset is the index of the container node of the point A as a child of the container node of the point B.

Received on Wednesday, 2 October 2002 14:15:38 UTC