W3C home > Mailing lists > Public > public-annotation@w3.org > December 2016

Re: named entity annotation use case

From: Tim Thompson <timathom@gmail.com>
Date: Thu, 29 Dec 2016 12:48:05 -0500
Message-ID: <CAPPeUfjvS0q=jED7-ds+uUUcVQ_Xijr0Q8+9J9z1TAeOs466Mw@mail.gmail.com>
To: Uldis Bojars <captsolo@gmail.com>
Cc: public-annotation@w3.org
Uldis,

We have the same use case, in the context of encoding handwritten
annotations using the Web Annotation model/vocab. You can see an example
here, which uses a TextQuoteSelector to tag "Paris" and pair it with its
GeoNames URI (ex:anno3):
https://gist.github.com/timathom/1f4345ad63e25b0fda1a7f80de6fe0b2

We could have added a TextPositionSelector to select the same string, but
the quote selector seemed sufficient for our purposes.

You can see that the Target of ex:anno3 is actually the Body of ex:anno1.
I've been assured that this pattern does not violate the specifications :)

I'd welcome comments, suggestions, or feedback about the approach we've
taken, or ways it could be improved.

Best regards,
Tim

--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library

www.linkedin.com/in/timathompson
tat2@princeton.edu

On Thu, Dec 29, 2016 at 9:43 AM, Uldis Bojars <captsolo@gmail.com> wrote:

> Hi,
>
> Is this the right place for questions on the usage of W3C Web Annotation
> model / vocab?
>
> The use case: named entity annotation where a user needs to annotate a
> text with mentions of named entities.
> How can the Web Annotation model / vocab be used to address this use case?
>
> It seems like a good fit but I have some questions:
>
> 1) how to represent a reference to a named entity (in oa:body)?
>
> One option is to use the named entity URI (e.g. Geonames or DBPedia) as
> the value of oa:body:
>
>> "body": "http://dbpedia.org/resource/Miguel_de_Cervantes"
>>
>
> However it is a very generic solution and it does not specify that this is
> a URI of an entity related to / mentioned in the content segment.
>
> Is there a better way?
>
> 2) how to "cache" the content of the target text segment?
>
> Is there a way to include a copy of the text fragment (= of the target) in
> the annotation [when selectors other than TextQuoteSelector are used]?
>
> Cases when this can be useful:
>   (a) if annotations ever get "separated" from target documents;
>   (b) if annotations need to be processed w/o having to search / look up
> target documents;
>   (c) to verify that the original annotated fragment has not changed.
>
> A workaround would be to also add the TextQuoteSelector (w/o prefix &
> suffix) but that feels like a hack.
>
> Ideally, it would be a generic property of a selector or target:
>
>     "selector": {
>       "type": "TextPositionSelector",
>       "start": 412,
>       "end": 445,
> *      "text": "..."*
>     }
>
> or
>
>   "target": {
>     "source": "http://example.org/ebook1",
> *    "text": "...",*
>     "selector": {
>       "type": "TextPositionSelector",
>       "start": 412,
>       "end": 445
>     }
>   }
>
> The property does not need to be restricted to text content (in which case
> a more generic name / label is needed) but it is sufficient for this
> particular use case.
>
> I looked at dcterms: to see if it has properties that could be used here
> but did not find any.
>
> Thanks,
> Uldis
>
>
>
>
>
>
>
>
>
Received on Thursday, 29 December 2016 17:51:20 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 December 2016 17:51:21 UTC