- From: Tom Morris <tfmorris@gmail.com>
- Date: Thu, 18 Jul 2013 11:07:44 -0400
- To: Dan Whaley <dwhaley@hypothes.is>
- Cc: "public-openannotation@w3.org" <public-openannotation@w3.org>
- Message-ID: <CAE9vqEHVT-m8_+Sygo08K+KSrW6sgyaB0+Qigt9ec3pGayhdTw@mail.gmail.com>
On Thu, Jul 18, 2013 at 9:35 AM, Dan Whaley <dwhaley@hypothes.is> wrote: > > http://googleresearch.blogspot.com/2013/07/11-billion-clues-in-800-million.html > > If anyone knows them, it might be intriguing to know if they've looked at > OA at all. > Is the implication that they would be motivated to reformat their published data if they knew about it? What would this example look like expressed in OA and what value would it add? Here is an excerpt from an annotation file: clueweb09-en0000-00-04720.html PDF21089210920.997636626.6723776e-05/m/0600qFDA21303213060.9998256 0.00057182228/m/032mxFood and Drug Administration21312213400.9998256 0.00057182228/m/032mx In this example, - "clueweb09-en0000-00-04720.html" is the name of the document that was annotated - "PDF" is the entity mention in text - 21089 and 21092 are the beginning and end byte offsets of the entity mention in the input text - 0.99763662 is the posterior of an entity given both the mention and the context (of the mention) - 6.6723776e-05 is the posterior given just the context of the mention (ignoring the mention string itself) - /m/0600q - Freebase identifier for the entity. To look up the entity in Freebase, just prepend the string "http://www.freebase.com" before the identifier, like so: "http://www.freebase.com/m/0600q".
Received on Thursday, 18 July 2013 15:08:13 UTC