- From: Doug Daniels <rainking@rice.edu>
- Date: Sun, 17 Nov 2002 11:57:01 -0600
- To: www-annotation@w3.org
Matthew Wilson and I were thinking about the way we locate strings
inside DOM elements. Currently, both Annozilla and Amaya use the four
argument variant of the XPointer string-range function. For a quick
refresher on string-range, see
http://www.w3.org/TR/2001/CR-xptr-20010911/#stringrange
Both Amaya and Annozilla opt to use a degenerate form of string-range,
which might better be described as "string-count". For the second
argument, the string to match, they always provide the empty string, "".
This matches everything in the string representation of the DOM Element
selected by the first argument. Then, they provide a start offset and a
length, thus uniquely identifying a substring within the DOM Element.
However, as Matthew pointed out, this method is quite fragile. Any
changes to the text of the DOM Element before the selected string make
the start value in an existing XPointer invalid. For example, assume we
have an XPointer that selects the first 'However' in this paragraph
using the empty-string four argument string-range. Now, if I add
another sentence to the beginning of the paragraph, I've completely
invalidated the XPointer. Even worse, it won't simply be orphaned, but
will select the wrong text entirely.
One way of solving this problem is to use the pattern-matching ability
of string-range. This would be the two-argument format of string-range
(omitting start and length), looking like
string-range(path to paragraph,"However"). In most cases, this seems
more robust against changes.
However, there is one problem. If you change a paragraph by adding a
similar phrase, you can confuse the XPointer. For example, if your old
paragraph was:
OLD:
I am a perfectly good paragraph. Hear me roar.
and you defined a string-range XPointer to the word 'roar':
string-range(OLD's path, "roar")
and then, you change the paragraph to read:
NEW:
I am a perfectly good paragraph. I'm having a roaring good time
writing
this. Hear me roar.
Now, your old XPointer will return 2 locations--the 'roar' in the second
sentence *and* the 'roar' in the first sentence. It's impossible to
know which one to choose--using the simple heuristic of choosing the
first would be incorrect in this example.
Nevertheless, it seems to me that the pattern-matching string-range will
perform better in most cases than the simple string-counting Amaya and
Annozilla are doing now. Do other people feel the same, or not?
Additionally, it's worth pointing out that Amaya (as of version 6.2)
didn't seem to be capable of resolving string-range XPointers that
required pattern matching. I don't think Amaya 6.4 will do it--it seems
to crash with an **irrecoverable error** whenever I try, which can't be
a good sign. Then again, a lot of annotations functionality seems to be
broken in 6.4...
Doug
Received on Sunday, 17 November 2002 12:57:57 UTC