Re: Web annotated by freebase concepts from Robert Sanderson on 2013-07-18 (public-openannotation@w3.org from July 2013)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Thu, 18 Jul 2013 09:27:47 -0600
To: Tom Morris <tfmorris@gmail.com>
Cc: Dan Whaley <dwhaley@hypothes.is>, "public-openannotation@w3.org" <public-openannotation@w3.org>
Message-ID: <CABevsUHZdjRE8zfGEsBPcHQM81pNoMeN10O9AEbVheW0fSMkhg@mail.gmail.com>

Hi Tom and all,

The PDF example could look like, in turtle:

<SomeURI> a oa:Annotation ;
  oa:hasBody <http://www.freebase.com/m/0600q> ;
  oa:hasTarget <target1> ;
  oa:hasMotivation oa:identifying .

<target1> a oa:SpecificResource ;
  oa:hasSelector <selector1> ;
  oa:hasSource <...clueweb09....html> ;
  xxx:someConfidenceProperty 0.99763662 .

<selector1> a oa:TextPositionSelector ;
  oa:start 21089 ;
  oa:end 21092 .

Which admittedly doesn't include the actual text "PDF" anywhere.
It could be expanded:

<target1> a oa:SpecificResource ;
  oa:hasSelector <choice1> ;
  oa:hasSource <...clueweb09....html> ;
  xxx:someConfidenceProperty 0.99763662 .

<choice1> a oa:Choice ;
  oa:default <selector1> ;
  oa:item <selector2> .

<selector1> a oa:TextPositionSelector ;
  oa:start 21089 ;
  oa:end 21092 .

<selector2> a oa:TextQuoteSelector ;
  oa:exact "PDF" .


Or, perhaps (although outside of the spec as written currently, but not
prohibited):

<target1> a oa:SpecificResource, cnt:ContentAsText ;
  oa:hasSelector <selector1> ;
  oa:hasSource <...clueweb09....html> ;
  xxx:someConfidenceProperty 0.99763662 ;
  cnt:chars "PDF" .

Which would say that the content of the SpecificResource is "PDF" ... which
I think is true according to the model.  If you gave an HTTP URI to a
specific resource, then if you dereference it, you expect to get back the
segment of the resource*. Here we just inline it in the same way as all of
the other embedded resources.

* The spec says: "If the Specific Resource has an HTTP URI, then the exact
segment of the Source resource that it identifies, and only the segment,
MUST be returned when the URI is dereferenced."
   http://www.openannotation.org/spec/core/specific.html#Specific

Rob



On Thu, Jul 18, 2013 at 9:07 AM, Tom Morris <tfmorris@gmail.com> wrote:

> On Thu, Jul 18, 2013 at 9:35 AM, Dan Whaley <dwhaley@hypothes.is> wrote:
>
>>
>> http://googleresearch.blogspot.com/2013/07/11-billion-clues-in-800-million.html
>>
>> If anyone knows them, it might be intriguing to know if they've looked at
>> OA at all.
>>
>
> Is the implication that they would be motivated to reformat their
> published data if they knew about it?  What would this example look like
> expressed in OA and what value would it add?
>
> Here is an excerpt from an annotation file:
>
> clueweb09-en0000-00-04720.html
> PDF21089 210920.997636626.6723776e-05 /m/0600qFDA 21303213060.99982560.00057182228
> /m/032mx Food and Drug Administration2131221340 0.99982560.00057182228
> /m/032mx
>
> In this example,
>
>    - "clueweb09-en0000-00-04720.html" is the name of the document that
>    was annotated
>    - "PDF" is the entity mention in text
>    - 21089 and 21092 are the beginning and end byte offsets of the entity
>    mention in the input text
>    - 0.99763662 is the posterior of an entity given both the mention and
>    the context (of the mention)
>    - 6.6723776e-05 is the posterior given just the context of the mention
>    (ignoring the mention string itself)
>    - /m/0600q - Freebase identifier for the entity. To look up the entity
>    in Freebase, just prepend the string "http://www.freebase.com" before
>    the identifier, like so: "http://www.freebase.com/m/0600q".
>
>
>

Received on Thursday, 18 July 2013 15:28:15 UTC