W3C home > Mailing lists > Public > public-annotation@w3.org > December 2016

named entity annotation use case

From: Uldis Bojars <captsolo@gmail.com>
Date: Thu, 29 Dec 2016 16:43:00 +0200
Message-ID: <CAJjMrENy+kJDoABr9Y5cD5oZgCndzGL-fuV6sx_41dmDVBoavw@mail.gmail.com>
To: public-annotation@w3.org
Hi,

Is this the right place for questions on the usage of W3C Web Annotation
model / vocab?

The use case: named entity annotation where a user needs to annotate a text
with mentions of named entities.
How can the Web Annotation model / vocab be used to address this use case?

It seems like a good fit but I have some questions:

1) how to represent a reference to a named entity (in oa:body)?

One option is to use the named entity URI (e.g. Geonames or DBPedia) as the
value of oa:body:

> "body": "http://dbpedia.org/resource/Miguel_de_Cervantes"
>

However it is a very generic solution and it does not specify that this is
a URI of an entity related to / mentioned in the content segment.

Is there a better way?

2) how to "cache" the content of the target text segment?

Is there a way to include a copy of the text fragment (= of the target) in
the annotation [when selectors other than TextQuoteSelector are used]?

Cases when this can be useful:
  (a) if annotations ever get "separated" from target documents;
  (b) if annotations need to be processed w/o having to search / look up
target documents;
  (c) to verify that the original annotated fragment has not changed.

A workaround would be to also add the TextQuoteSelector (w/o prefix &
suffix) but that feels like a hack.

Ideally, it would be a generic property of a selector or target:

    "selector": {
      "type": "TextPositionSelector",
      "start": 412,
      "end": 445,
*      "text": "..."*
    }

or

  "target": {
    "source": "http://example.org/ebook1",
*    "text": "...",*
    "selector": {
      "type": "TextPositionSelector",
      "start": 412,
      "end": 445
    }
  }

The property does not need to be restricted to text content (in which case
a more generic name / label is needed) but it is sufficient for this
particular use case.

I looked at dcterms: to see if it has properties that could be used here
but did not find any.

Thanks,
Uldis
Received on Thursday, 29 December 2016 14:43:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 18:54:51 UTC