W3C home > Mailing lists > Public > public-lod@w3.org > August 2010

Re: Linked Data and IRI dereferencing (scale limits?)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Fri, 6 Aug 2010 22:02:36 +0200
Message-ID: <AANLkTimyV416TNHUSFZ0FZ_0WgYiSrv6pCXdtuGpAVdf@mail.gmail.com>
To: Paul Houle <ontology2@gmail.com>
Cc: Jörn Hees <j_hees@cs.uni-kl.de>, public-lod <public-lod@w3.org>
Thanks Paul, this sort of feedback is indeed tremeoudly useful,

I somehow just wish you had had 1/10th of the replies of the subjects as
literal thread.:-)
Gio

(obviously we're talking business of LOD at large and the true state of it
despite the growing number of lines in the lod cloud diagram. We're not a
specific tecnicalities of dbpedia which is obviously run as good as the guys
economically can)


On Thu, Aug 5, 2010 at 4:07 PM, Paul Houle <ontology2@gmail.com> wrote:

> If you want to get something done with dbpedia,  you should (i) work from
> the data dumps,  or (ii) give up and use Freebase instead.
>
> I used to spend weeks figuring how to to clean up the mess in dbpedia until
> the day I wised up and realized I could do in 15 minutes w/ Freebase what
> takes 2 weeks to do w/ dbpedia,  because w/ dbpedia you need to do a huge
> amount of data cleaning to get anything that makes sense.
>
> The issue here isn't primarily "RDF vs Freebase" but it's really a matter
> of the business model (or lack thereof) behind dbpedia;  frankly,  nobody
> gets excited when dbpedia doesn't work,  and that's the problem.  For
> instance,  nobody at dbpedia seems to give a damn that dbpedia contains 3000
> "countries",  wheras there's more like 200 actual active countries in the
> world...  Sure,  it's great to have a category for things like
> "Austria-Hungary" and "The Teutonic Knights",  but an awful lot of people
> give up on dbpedia when they see they can't easily get a list of very basic
> things,  like a list of countries.
>
> Now,  I was able to,  more-or-less,  define "active country" as a
> restriction type:  anything that has an ISO country code in freebase is an
> active country,  or is pretty close.  The ISO codes aren't in dbpedia
> (because they're not in wikipedia infoboxes) so this can't be done with
> dbpedia:  i'd probably need to code some complex rules that try to guess at
> this based on category memberships and what facts are available in the
> infobox.
>
> I complained on both dbpedia and freebase discussion lists,  and found
> that:  (i) nobody at dbpedia wants to do anything about this,  and (ii) the
> people at freebase have investigated this and they are going to do something
> about it.
>
> --------
>
> In my mind,  anyway,  the semantic web is a set of structured boxes. It's
> not like there's one "T Box" and one "A Box" but there are nested boxes of
> increasing specificity.  In the systems I'm building,  a Freebase-dbpedia
> merge is used as a sort of "T' Box" that helps to structure and interpret
> information that comes from other sources.  With a little thinking about
> data structures,  it's efficient to have a local copy of this data and use
> it as a skeleton that gets fleshed out with other stuff.  Closed-world
> reasoning about this "taxonomic core" is useful in a number of ways,
> particularly in the detection of key integrity problems,  data holes,
> inconsistencies,  junk data,  etc.  I think the "dereference and merge"
> paradigm is useful once you've got the taxocore and you're merging little
> bits of high-qualtiy data,  but w/o control of the taxocore you're just
> doomed.
>
Received on Friday, 6 August 2010 20:03:04 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:28 UTC