Re: Floating Quotable Citations (FQC) from Paolo Ciccarese on 2013-02-24 (public-openannotation@w3.org from February 2013)

From: Paolo Ciccarese <paolo.ciccarese@gmail.com>
Date: Sun, 24 Feb 2013 11:20:11 -0500
To: David Cuenca <dacuetu@gmail.com>
Cc: Dan Whaley <dwhaley@hypothes.is>, Robert Sanderson <azaroth42@gmail.com>, "<public-openannotation@w3.org>" <public-openannotation@w3.org>
Message-ID: <CAFPX2kBw+u67YyZoJ-tZSUt_8Tzcti1w1cgpFbv0oMK5JipiRw@mail.gmail.com>

David,
in Domeo I do something very similar with what Dan's wiki page outlines.

Domeo deals only with annotation of HTML but I need to be able to have the
same annotation displayed on the PDF.
We are using the system since more than 2 years now and  I perform  the
following operations (ignoring the HTML markup).

Once the user performs the selection I calculate  prefix, match  and
postfix.
I set a max number of chars for this step (normally 64 for both prefix and
postfix).
Given the potential complexity of the combination HTML+CSS I have some
rules of thumb on how to select prefix/postfix.

Then I calculate a score that basically adapts according to the length of
the match.
If the match is particularly short: I check the combined length of
prefix+suffix. If those are too short combined (<64*2)
I normally recalculate one of the two (ex: suffix) in order to be longer
(=64*2-(length of the prefix).
That way I end up having enough text to hit/find the match.

I have the option of trying to search for the text right away and detect if
what you find is the same of the current selection.
If you don't you can try and make the prefix/match/postfix longer or change
strategy (adding more info).

For instance you can also store the location, but that can change if the
document changes structure and the counting does not work very well with
HTML.
If you have a very redundant document, you can keep track of the occurrence
of that prefix/match/postfix. That helps you until the document changes.
When the document changes you have no guarantee that the selection is
correct (a previous occurrence of that pattern is erased).

Dan, I am guessing I can share more details on your wiki and we can join
forces on this topic?

Best,
Paolo

On Sat, Feb 23, 2013 at 11:25 PM, David Cuenca <dacuetu@gmail.com> wrote:

> On Fri, Feb 22, 2013 at 1:50 PM, Dan Whaley <dwhaley@hypothes.is> wrote:
>
>> But instead of exact matching on the prefix/postfix contexts, we use a
>> fuzzy match to improve somewhat on the brittleness that hard context
>> anchors have when changes to the document occur within them.
>>
>> One of the design objectives here was to support cross-format annotation
>> (annotations to the PDF can be surfaced on the HTML version, etc).
>>
>
> Dan, that is certainly impressive, it looks like a quite reliable method
> for annotating mutable digital documents.
> The advantage of printed material is that changes between the original
> source and proofread text are close to nil.
> On the other hand, data availability is less than on purely digital
> documents, therefore input text should be kept to a minimum.
>
> I'll elaborate on your mailing list, it might be worthwhile.
>
> David
>

-- 
Dr. Paolo Ciccarese
http://www.paolociccarese.info/
Biomedical Informatics Research & Development
Instructor of Neurology at Harvard Medical School
Assistant in Neuroscience at Mass General Hospital
Member of the MGH Biomedical Informatics Core
+1-857-366-1524 (mobile)   +1-617-768-8744 (office)

CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s),
may contain information that is considered
to be sensitive or confidential and may not be forwarded or disclosed to
any other party without the permission of the sender.
If you have received this message in error, please notify the sender
immediately.

Received on Sunday, 24 February 2013 16:20:41 UTC