Re: Web annotated by freebase concepts

On Thu, Jul 18, 2013 at 9:35 AM, Dan Whaley <dwhaley@hypothes.is> wrote:

>
> http://googleresearch.blogspot.com/2013/07/11-billion-clues-in-800-million.html
>
> If anyone knows them, it might be intriguing to know if they've looked at
> OA at all.
>

Is the implication that they would be motivated to reformat their published
data if they knew about it?  What would this example look like expressed in
OA and what value would it add?

Here is an excerpt from an annotation file:

clueweb09-en0000-00-04720.html
PDF21089210920.997636626.6723776e-05/m/0600qFDA21303213060.9998256
0.00057182228/m/032mxFood and Drug Administration21312213400.9998256
0.00057182228/m/032mx

In this example,

   - "clueweb09-en0000-00-04720.html" is the name of the document that was
   annotated
   - "PDF" is the entity mention in text
   - 21089 and 21092 are the beginning and end byte offsets of the entity
   mention in the input text
   - 0.99763662 is the posterior of an entity given both the mention and
   the context (of the mention)
   - 6.6723776e-05 is the posterior given just the context of the mention
   (ignoring the mention string itself)
   - /m/0600q - Freebase identifier for the entity. To look up the entity
   in Freebase, just prepend the string "http://www.freebase.com" before
   the identifier, like so: "http://www.freebase.com/m/0600q".

Received on Thursday, 18 July 2013 15:08:13 UTC