W3C home > Mailing lists > Public > www-annotation@w3.org > July to December 2002

Re: robustness of string matching XPointers

From: Jose Kahan <jose.kahan@w3.org>
Date: Mon, 18 Nov 2002 17:19:12 +0100
To: Doug Daniels <rainking@rice.edu>
Cc: www-annotation@w3.org
Message-ID: <20021118161912.GD4164@inrialpes.fr>

Hello Doug,

The XPointer implementation in Amaya doesn't handle yet string-range
expressions with pattern matching. It just ignores that expression.
The reason for this is that I only implemented the functions that I
needed for the XPointers I was creating. It doesn't mean I cannot
improve my implementation to handle more complex XPointers, like the one 
you're proposing. I won't have time to do this change before the next
release, though.

I fixed the crashing bug about one month ago. Could you download one of the 
Amaya snapshots to check this out? If it still crashes, tell me so and
I'll fix it.

For your proposed XPointer expression for annotations, it does seem to
make annotations more robust to a number of changes. You may wish to limit 
it to the nth instance of the pattern, rather than have XPointer function
resolve to more than one location.


On Sun, Nov 17, 2002 at 11:57:01AM -0600, Doug Daniels wrote:
> Matthew Wilson and I were thinking about the way we locate strings 
> inside DOM elements.  Currently, both Annozilla and Amaya use the four 
> argument variant of the XPointer string-range function.  For a quick 
> refresher on string-range, see 
> http://www.w3.org/TR/2001/CR-xptr-20010911/#stringrange
> Both Amaya and Annozilla opt to use a degenerate form of string-range, 
> which might better be described as "string-count".  For the second 
> argument, the string to match, they always provide the empty string, "". 
> This matches everything in the string representation of the DOM Element 
> selected by the first argument.  Then, they provide a start offset and a 
> length, thus uniquely identifying a substring within the DOM Element. 
> However, as Matthew pointed out, this method is quite fragile.  Any 
> changes to the text of the DOM Element before the selected string make 
> the start value in an existing XPointer invalid.  For example, assume we 
> have an XPointer that selects the first 'However' in this paragraph 
> using the empty-string four argument string-range.  Now, if I add 
> another sentence to the beginning of the paragraph, I've completely 
> invalidated the XPointer.  Even worse, it won't simply be orphaned, but 
> will select the wrong text entirely.
> One way of solving this problem is to use the pattern-matching ability 
> of string-range.  This would be the two-argument format of string-range 
> (omitting start and length), looking like
> string-range(path to paragraph,"However").   In most cases, this seems 
> more robust against changes.
> However, there is one problem.  If you change a paragraph by adding a 
> similar phrase, you can confuse the XPointer.  For example, if your old 
> paragraph was:
>    OLD:
>     I am a perfectly good paragraph.  Hear me roar.
> and you defined a string-range XPointer to the word 'roar':
> string-range(OLD's path, "roar")
> and then, you change the paragraph to read:
>    NEW:
>    I am a perfectly good paragraph.  I'm having a roaring good time 
> writing
>    this.  Hear me roar.
> Now, your old XPointer will return 2 locations--the 'roar' in the second 
> sentence *and* the 'roar' in the first sentence.  It's impossible to 
> know which one to choose--using the simple heuristic of choosing the 
> first would be incorrect in this example.
> Nevertheless, it seems to me that the pattern-matching string-range will 
> perform better in most cases than the simple string-counting Amaya and 
> Annozilla are doing now.  Do other people feel the same, or not?
> Additionally, it's worth pointing out that Amaya (as of version 6.2) 
> didn't seem to be capable of resolving string-range XPointers that 
> required pattern matching.  I don't think Amaya 6.4 will do it--it seems 
> to crash with an **irrecoverable error** whenever I try, which can't be 
> a good sign.  Then again, a lot of annotations functionality seems to be 
> broken in 6.4...
Received on Monday, 18 November 2002 11:19:42 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:55 UTC