RE: Orphaned annotations from David M Bargeron on 2002-03-19 (www-annotation@w3.org from January to June 2002)

From: David M Bargeron <davemb@microsoft.com>
Date: Tue, 19 Mar 2002 10:08:50 -0800
To: <jose.kahan@w3.org>
Cc: <www-annotation@w3.org>
Message-ID: <0170DDAD0BADFA4CBEC3B55A0748DCCC069DD7FA@red-msg-02.redmond.corp.microsoft.com>
Our currently published work in the area of robust annotation anchoring
only covers anchors for selected text content, that's true. However, the
same basic approach (e.g. feature extraction) applies equally well -- if
not better -- to images, video, audio, etc. 

In general, annotation of any medium by an end user will be more
effective in the face of change if the annotations are anchored with
content-based anchors. In particular, this implies that anchoring for
any specific medium is medium-specific: annotations will be anchored to
images using a different algorithm than if they are anchored to text.  

In the short term, this complicates the standards picture quite a bit.
But if the goal is to create standard ways to effectively serve end
users with respect to annotations on web resources, it is (I believe)
the best way to go. Format-specific structure-based anchoring techniques
will end up frustrating and surprising end users more than it helps
them.

In the long term, the immediate need for a family of robust anchoring
algorithms actually points toward a deeper problem: The standards that
exist today for the display and manipulation of text, images, video,
audio, etc. provide insufficient information about the internal semantic
structure of those media, and that's why we have to create work-arounds
to make up for it. And I don't think RDF is the answer, though it is a
good start. What we need is a MUCH better understanding of the
*internal* semantic structure of all kinds of media.  MPEG 4 and 7 are
examples of attempts to standardize an enhanced understanding of the
internal semantic structure of video, and can be taken as models for
where we need to head with text, images, audio, and other basic media
types. Annotation is not the only field that will benefit from enhanced
standards in this respect: Indexing, search, document organization, and
many other applications also need these improvements.

On the issue of annotating whole sections of a document, you are
absolutely right, human-level structural information such as chapters of
a book or sections of a document, are really important contextual clues
for annotation. We have not yet focused on integrating these clues into
our algorithm, though we recognize they are very important. But though
these clues are structural in nature, they are still "human-level" as
opposed to format-specific: A book chapter is a book chapter is a book
chapter, whether it is represented in pdf or html or .doc or ascii. So I
think my original assertion stands, that and end-user annotation
anchoring scheme must be based on human-level content, not
format-specific structure.

Dave

-----Original Message-----
From: Jose Kahan [mailto:jose.kahan@w3.org] 
Sent: Tuesday, March 19, 2002 1:31 AM
To: David M Bargeron
Cc: Jim Ley; www-annotation@w3.org
Subject: Re: Orphaned annotations

Hello David,

Thanks for your feedback.

I'm aware about your work and have read your papers :)

A limitation about your schema is that it only allows to annotate
text. It's not possible to use it to annotate an SVG image or a MathML
formula, for example. That's where the structure becomes important.
You may also want to annotate a whole chapter or a section of a
document.

-jose

On Mon, Mar 18, 2002 at 08:50:08AM -0800, David M Bargeron wrote:
> We have done some work in the area of robust anchoring for annotations
> on web pages. We found that using "human-level" page content as the
> basis for anchoring was more effective than using the internal
structure
> of the page. That is, when a user annotates a sentence in a paragraph,
> she is thinking about the text that she is annotating, not the fact
that
> it is the 3rd sentence in the 5th paragraph, for instance. By
anchoring
> to the "human-level" content (the actual words that a user sees), we
can
> better meet the user's expectations when the page is modified.
Received on Tuesday, 19 March 2002 13:08:58 UTC