- From: Paul Houle <ontology2@gmail.com>
- Date: Sat, 9 Jun 2012 11:00:41 -0400
- To: Kingsley Idehen <kidehen@openlinksw.com>
- Cc: lotico-list@googlegroups.com, "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
My guess is that the 300M entities could be hot air for now. Maybe they've got a "second true graph" with 300M entities in it, but it's probably not powering the production system. Right now recall is low for the Google Knowledge graph because they don't want to take the chance of showing spurious results. Most Freebase topics aren't showing up and they shouldn't. Freebase is full of "twisty little objects that all look alike" For instance, there are 20 or so objects in Freebase named "Sweet Home Alabama". Almost all of the probability weight for this is on the radio edit, but most of these are covers, re-releases on greatest hits albums, etc. That's all very great data because it corresponds to real observations of music in the wild, but in the commonsense domain these get squashed. Oddly, Google loses the classic rock song entirely and turns up a mediocre but commercially successful movie... https://www.google.com/#hl=en&gs_nf=1&tok=V0cZbCtNDVsjrfKATbImzw&cp=7&gs_id=7x&xhr=t&q=sweet+home+alabama&pf=p&output=search&sclient=psy-ab&oq=sweet+h&aq=0&aqi=g4&aql=&gs_l=&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&fp=f9be4f0b957a8550&biw=1600&bih=775 The real value of the GKG may be in what gets deleted instead of what gets added. Anyhow, some things that ~could~ be in Freebase and aren't are (1) Consumer Products, (2) Local Businesses (think of what's in Foursquare or Factual), and (3) Google data about books #3 is the real sore spot. We know Google has great metadata for books, but Freebase has loaded only a percentage of books from OpenLibrary. When I found that a number of books I was thinking about weren't there they suggested that I finish the Open Library load myself... Of course, Google's book project is under a legal cloud and their lawyers might feel that they aren't free to release the metadata.
Received on Saturday, 9 June 2012 15:01:57 UTC