W3C home > Mailing lists > Public > www-annotation@w3.org > January to June 2002

RE: Orphaned annotations

From: David M Bargeron <davemb@microsoft.com>
Date: Tue, 19 Mar 2002 10:23:49 -0800
Message-ID: <0170DDAD0BADFA4CBEC3B55A0748DCCC069DD7FB@red-msg-02.redmond.corp.microsoft.com>
To: "Ka-Ping Yee" <ping@lfw.org>
Cc: "Jim Ley" <jim@jibbering.com>, <www-annotation@w3.org>
Thanks for your comments Ka-Ping. I mostly agree with your first
assertion -- that we would need to standardize all aspects of our
algorithm if it were to be useful as an interoperable standard. However
I also partly disagree -- our empirical observations from subsequent
studies showed that different users choose different anchoring
confidence thresholds below which annotations are automatically orphaned
by our algorithm. That means that some users care less about precise
anchoring than others. So I don't think we could freeze the confidence
thresholds -- they are dependent on user preference, the nature of the
material being annotated, the task, and other factors. A natural result
of this is that I will get different anchoring results than someone else
might.  This is acceptible, though, as long as I like my anchoring
results more than the other person's ; )

I am not sure I understand your second assertion. The algorithm is
deterministic, and relies only on human-level content, so it should
produce the same results given the same human-level content (and same
confidence threshold settings), regardless of format.  What sort of
failures do you envision?  If you mean orphaning, then yes, of course
there will be orphans.  This is not a failure of the algorithm,
though...it is in the nature of dynamic document annotation.  If I
annotate a paragraph in a webpage today, and tomorrow the paragraph
simply disappears from the webpage, then what is the anchoring algorithm
to do?  The final arbeiter is always the user, so *any* anchoring
algorithm, regardless of how robust it tries to be, must be prepared to
ask the user for help when it reaches an impass such as this.

Dave

-----Original Message-----
From: Ka-Ping Yee [mailto:ping@lfw.org] 
Sent: Tuesday, March 19, 2002 5:00 AM
To: David M Bargeron
Cc: Jim Ley; www-annotation@w3.org
Subject: RE: Orphaned annotations

On Mon, 18 Mar 2002, David M Bargeron wrote:
> We have done some work in the area of robust anchoring for annotations
> on web pages. We found that using "human-level" page content as the
> basis for anchoring was more effective than using the internal
structure
> of the page.

Bravo.  The truth of this last statement should be obvious to everyone.
Of course it has to be human-level content; it is humans that these
systems are supposed to serve, after all -- or have we forgotten that?

I saw your presentation at CHI 2001, and thought you guys did some
pretty cool work with your anchoring algorithm.  I was sad that
you didn't cite Crit in your paper, but I now see that at least
you mentioned it in your tech report.

I was impressed by the clever results you got by keeping track of
keywords and context, as you do.  But i have to say i'm not certain
that's necessarily a win.  With an algorithm like that, there are
a few issues:

    1. For deployment as an interoperable standard, you have to
       specify the algorithm and any data it relies on so that
       all implementations produce exactly the same result.
       So you would need to standardize all your tuning parameters,
       scoring weights, thresholds and so on.

    2. It is unreasonable to expect that a user could duplicate all
       of the steps in your algorithm to predict the outcome.
       This makes it possible to have failures that no user could
       prepare for or defend against.

There's still potential for fuzzy matching algorithms as a way of
suggesting possible relevant areas of the target document after an
annotation has been orphaned.  It would be worth comparing the
utility of this to the utility of simply displaying the original
anchor with some context.


-- ?!ng
Received on Tuesday, 19 March 2002 13:24:22 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.30 : Friday, 25 March 2005 11:19:17 GMT