- From: Doug Daniels <rainking@rice.edu>
- Date: Sun, 17 Nov 2002 11:57:01 -0600
- To: www-annotation@w3.org
Matthew Wilson and I were thinking about the way we locate strings inside DOM elements. Currently, both Annozilla and Amaya use the four argument variant of the XPointer string-range function. For a quick refresher on string-range, see http://www.w3.org/TR/2001/CR-xptr-20010911/#stringrange Both Amaya and Annozilla opt to use a degenerate form of string-range, which might better be described as "string-count". For the second argument, the string to match, they always provide the empty string, "". This matches everything in the string representation of the DOM Element selected by the first argument. Then, they provide a start offset and a length, thus uniquely identifying a substring within the DOM Element. However, as Matthew pointed out, this method is quite fragile. Any changes to the text of the DOM Element before the selected string make the start value in an existing XPointer invalid. For example, assume we have an XPointer that selects the first 'However' in this paragraph using the empty-string four argument string-range. Now, if I add another sentence to the beginning of the paragraph, I've completely invalidated the XPointer. Even worse, it won't simply be orphaned, but will select the wrong text entirely. One way of solving this problem is to use the pattern-matching ability of string-range. This would be the two-argument format of string-range (omitting start and length), looking like string-range(path to paragraph,"However"). In most cases, this seems more robust against changes. However, there is one problem. If you change a paragraph by adding a similar phrase, you can confuse the XPointer. For example, if your old paragraph was: OLD: I am a perfectly good paragraph. Hear me roar. and you defined a string-range XPointer to the word 'roar': string-range(OLD's path, "roar") and then, you change the paragraph to read: NEW: I am a perfectly good paragraph. I'm having a roaring good time writing this. Hear me roar. Now, your old XPointer will return 2 locations--the 'roar' in the second sentence *and* the 'roar' in the first sentence. It's impossible to know which one to choose--using the simple heuristic of choosing the first would be incorrect in this example. Nevertheless, it seems to me that the pattern-matching string-range will perform better in most cases than the simple string-counting Amaya and Annozilla are doing now. Do other people feel the same, or not? Additionally, it's worth pointing out that Amaya (as of version 6.2) didn't seem to be capable of resolving string-range XPointers that required pattern matching. I don't think Amaya 6.4 will do it--it seems to crash with an **irrecoverable error** whenever I try, which can't be a good sign. Then again, a lot of annotations functionality seems to be broken in 6.4... Doug
Received on Sunday, 17 November 2002 12:57:57 UTC