Re: robustness of string matching XPointers

A vague thought:

There is a lot of stuff that gets annotated, because the author is working in
isolation.

The Amaya approach is one where the author might look at existing annotations
(since they are availab le in the client). This means the client looks at
them too, and would know when an Xpointer may become ambiguous.

José, if you are looking for a lot of work, it might be interesting to look
at how to implement this - or anyone on the list might look at how to ask an
author what to do with annotations thay have just orphaned, or how to find
out whether an author of a program can change the pointers of an annotation
when they have changed the thing being annotated...

cheers

Chaals

On Mon, 18 Nov 2002, Jose Kahan wrote:

>
>Hello Doug,
>
>The XPointer implementation in Amaya doesn't handle yet string-range
>expressions with pattern matching. It just ignores that expression.
>The reason for this is that I only implemented the functions that I
>needed for the XPointers I was creating. It doesn't mean I cannot
>improve my implementation to handle more complex XPointers, like the one
>you're proposing. I won't have time to do this change before the next
>release, though.
>
>I fixed the crashing bug about one month ago. Could you download one of the
>Amaya snapshots to check this out? If it still crashes, tell me so and
>I'll fix it.
>
>For your proposed XPointer expression for annotations, it does seem to
>make annotations more robust to a number of changes. You may wish to limit
>it to the nth instance of the pattern, rather than have XPointer function
>resolve to more than one location.
>
>-jose
>
>On Sun, Nov 17, 2002 at 11:57:01AM -0600, Doug Daniels wrote:
>>
>> Matthew Wilson and I were thinking about the way we locate strings
>> inside DOM elements.  Currently, both Annozilla and Amaya use the four
>> argument variant of the XPointer string-range function.  For a quick
>> refresher on string-range, see
>> http://www.w3.org/TR/2001/CR-xptr-20010911/#stringrange
>>
>> Both Amaya and Annozilla opt to use a degenerate form of string-range,
>> which might better be described as "string-count".  For the second
>> argument, the string to match, they always provide the empty string, "".
>> This matches everything in the string representation of the DOM Element
>> selected by the first argument.  Then, they provide a start offset and a
>> length, thus uniquely identifying a substring within the DOM Element.
>> However, as Matthew pointed out, this method is quite fragile.  Any
>> changes to the text of the DOM Element before the selected string make
>> the start value in an existing XPointer invalid.  For example, assume we
>> have an XPointer that selects the first 'However' in this paragraph
>> using the empty-string four argument string-range.  Now, if I add
>> another sentence to the beginning of the paragraph, I've completely
>> invalidated the XPointer.  Even worse, it won't simply be orphaned, but
>> will select the wrong text entirely.
>>
>> One way of solving this problem is to use the pattern-matching ability
>> of string-range.  This would be the two-argument format of string-range
>> (omitting start and length), looking like
>> string-range(path to paragraph,"However").   In most cases, this seems
>> more robust against changes.
>>
>> However, there is one problem.  If you change a paragraph by adding a
>> similar phrase, you can confuse the XPointer.  For example, if your old
>> paragraph was:
>>
>>    OLD:
>>     I am a perfectly good paragraph.  Hear me roar.
>>
>> and you defined a string-range XPointer to the word 'roar':
>> string-range(OLD's path, "roar")
>>
>> and then, you change the paragraph to read:
>>
>>    NEW:
>>    I am a perfectly good paragraph.  I'm having a roaring good time
>> writing
>>    this.  Hear me roar.
>>
>> Now, your old XPointer will return 2 locations--the 'roar' in the second
>> sentence *and* the 'roar' in the first sentence.  It's impossible to
>> know which one to choose--using the simple heuristic of choosing the
>> first would be incorrect in this example.
>>
>> Nevertheless, it seems to me that the pattern-matching string-range will
>> perform better in most cases than the simple string-counting Amaya and
>> Annozilla are doing now.  Do other people feel the same, or not?
>>
>> Additionally, it's worth pointing out that Amaya (as of version 6.2)
>> didn't seem to be capable of resolving string-range XPointers that
>> required pattern matching.  I don't think Amaya 6.4 will do it--it seems
>> to crash with an **irrecoverable error** whenever I try, which can't be
>> a good sign.  Then again, a lot of annotations functionality seems to be
>> broken in 6.4...
>

-- 
Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409 134 136
SWAD-E http://www.w3.org/2001/sw/Europe ------------ WAI http://www.w3.org/WAI
 21 Mitchell street, FOOTSCRAY Vic 3011, Australia  fax(fr): +33 4 92 38 78 22
 W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France

Received on Monday, 18 November 2002 11:38:39 UTC