W3C home > Mailing lists > Public > public-lod@w3.org > June 2012

Re: Fwd: Knowledge Graph links to Freebase

From: Paul Houle <ontology2@gmail.com>
Date: Sat, 9 Jun 2012 11:00:41 -0400
Message-ID: <CAE__kdRwwMf205CUWgYZ6Gp6qW0Hvyxigh8Gt4Ym4Cbs-VJzvA@mail.gmail.com>
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: lotico-list@googlegroups.com, "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
    My guess is that the 300M entities could be hot air for now.
Maybe they've got a "second true graph" with 300M entities in it,  but
it's probably not powering the production system.

    Right now recall is low for the Google Knowledge graph because
they don't want to take the chance of showing spurious results.  Most
Freebase topics aren't showing up and they shouldn't.  Freebase is
full of "twisty little objects that all look alike"  For instance,
there are 20 or so objects in Freebase named "Sweet Home Alabama".
Almost all of the probability weight for this is on the radio edit,
but most of these are covers,  re-releases on greatest hits albums,
etc.  That's all very great data because it corresponds to real
observations of music in the wild,  but in the commonsense domain
these get squashed.

     Oddly,  Google loses the classic rock song entirely and turns up
a mediocre but commercially successful movie...


     The real value of the GKG may be in what gets deleted instead of
what gets added.

     Anyhow,  some things that ~could~ be in Freebase and aren't are

(1) Consumer Products,
(2) Local Businesses (think of what's in Foursquare or Factual),  and
(3) Google data about books

     #3 is the real sore spot.  We know Google has great metadata for
books,  but Freebase has loaded only a percentage of books from
OpenLibrary.  When I found that a number of books I was thinking about
weren't there they suggested that I finish the Open Library load

     Of course,  Google's book project is under a legal cloud and
their lawyers might feel that they aren't free to release the
Received on Saturday, 9 June 2012 15:01:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:21:25 UTC