Re: Can we lower the LD entry cost please (part 1)?

Georgi,

> Very sorry for being picky here, but that's just such a good example of
> my argument: Why should I invest my time to provide a voiD description for
DBpedia? 

Very valid point and I'm thankful you're raising this issue. Now, it might
be perfectly possible that we have done a bad job communicating what voiD is
for, or you've not yet had time to dig into it. Or both possible as well ;)

So, there are really two questions here on the table: (i) why should you as
DBpedia do it, and (ii) why should any of the publishers care.

(i) There are many answers to it, I guess. Because you are a good linked
data citizen, because you want to authoritatively say how many triples are
in there, etc. rather than waiting till others (claim to) do so. As a matter
of fact, DBpedia is (currently) the centre of the linked data universe; this
will likely stay so (we wouldn't have based our default categorisation
scheme on it if we wouldn't bet on it, right?:). Hence, you can lean back
and think for yourself: there is no way around DBpedia, so why should I
care? Well, it might perfectly be that others publish Wikipedia as linked
data as well (there are a lot of attempts, non very successful ones I know
of) and then you will compete. And voiD offers a mean to communicate what
your data is about, both in terms of effectiveness and efficiency.

(ii) Any other publisher should care, of course. I admit that these days,
with some 50+ datasets, we all more or less know what is available and who
is behind it. But when the WWW was launched there were as well only a couple
of server and a pages around, so hosts.txt and some manually maintained
directories were more or less sufficient. This has changed, as we know, and
it is certainly fair (and in the interest of the community, I think) to
assume that the same will happen in the near future. Here, voiD can be seen
as the primary metadata we have at hand that will enable things like
automatic assembly of applications, support the exploration of datasets,
rank datasets in searches and more.

> Of course, I'd do the community a favor, but I don't see any other
> reason.

You do yourself a favour :)

> If there was a great application that consumes void and nicely
> displays e.g. Musicbrainz and Geonames data, but doesn't display any
> DBpedia data because of the missing voiD description of DBpedia, I would
> have an incentive to provide it.

Expecting that there would be tons of applications just after a week the
first draft of a spec has been released is ... well, I'm honoured ;)

Anyways, we see a certain uptake already and, yes, this is definitely on
(maybe not only) my agenda.

> No offence, I'm just trying to emphasize my point.

No offense taken. You're raising valid issues and did not attack me as a
person. I'm able to and easy with distinguishing work I'm contributing to
and myself as a person.

> Sure, you'll get me with the community favor approach, but I strongly doubt
that you will get others...

Is it?

Cheers,
      Michael

-- 
Dr. Michael Hausenblas
DERI - Digital Enterprise Research Institute
National University of Ireland, Lower Dangan,
Galway, Ireland, Europe
Tel. +353 91 495730
http://sw-app.org/about.html


> From: Georgi Kobilarov <georgi.kobilarov@gmx.de>
> Date: Sun, 8 Feb 2009 16:48:47 +0100
> To: Michael Hausenblas <michael.hausenblas@deri.org>, Andraz Tori
> <andraz@zemanta.com>, Hugh Glaser <hg@ecs.soton.ac.uk>
> Cc: Linked Data community <public-lod@w3.org>
> Subject: RE: Can we lower the LD entry cost please (part 1)?
> 
> Hi Michael,
> 
>> Looking forward to find and use a respective voiD description for
>> DBpedia ;)
> 
> Very sorry for being picky here, but that's just such a good example of
> my argument:
> Why should I invest my time to provide a voiD description for DBpedia?
> 
> Of course, I'd do the community a favor, but I don't see any other
> reason. If there was a great application that consumes void and nicely
> displays e.g. Musicbrainz and Geonames data, but doesn't display any
> DBpedia data because of the missing voiD description of DBpedia, I would
> have an incentive to provide it.
> 
> No offence, I'm just trying to emphasize my point. Sure, you'll get me
> with the community favor approach, but I strongly doubt that you will
> get others...
> 
> Cheers,
> Georgi
> 
> --
> Georgi Kobilarov
> Freie Universität Berlin
> www.georgikobilarov.com
> 
> 
>> -----Original Message-----
>> From: Michael Hausenblas [mailto:michael.hausenblas@deri.org]
>> Sent: Sunday, February 08, 2009 4:29 PM
>> To: Georgi Kobilarov; Andraz Tori; Hugh Glaser
>> Cc: Linked Data community
>> Subject: Re: Can we lower the LD entry cost please (part 1)?
>> 
>> 
>> Georgi, All,
>> 
>>> If we don't reward the Linked Data publishers who provide clean data
>> and
>>> penalize those who don't, there will never be an incentive to do it
>> right.
>> 
>> I couldn't agree more. I have contemplated about that recently (p16 in
>> [1])
>> and, yes, one goal of voiD is helping publishers to concisely express
>> what
>> their dataset is about, under which license it is available, which
>> vocabularies are used or how many triples one can expect [2] and on
> the
>> other hand how the dataset is linked with other datasets [3].
>> 
>> Looking forward to find and use a respective voiD description for
>> DBpedia ;)
>> 
>> Cheers,
>>       Michael
>> 
>> [1] http://www.talis.com/nodalities/pdf/nodalities_issue4.pdf
>> [2] http://rdfs.org/ns/void-guide#sec_1_Describing_Datasets
>> [3] http://rdfs.org/ns/void-guide#sec_2_Describing_Dataset_Interlink
>> 
>> --
>> Dr. Michael Hausenblas
>> DERI - Digital Enterprise Research Institute
>> National University of Ireland, Lower Dangan,
>> Galway, Ireland, Europe
>> Tel. +353 91 495730
>> http://sw-app.org/about.html
>> 
>> 
>>> From: Georgi Kobilarov <georgi.kobilarov@gmx.de>
>>> Date: Sun, 8 Feb 2009 15:56:23 +0100
>>> To: Andraz Tori <andraz@zemanta.com>, Hugh Glaser
>> <hg@ecs.soton.ac.uk>
>>> Cc: Linked Data community <public-lod@w3.org>
>>> Subject: RE: Can we lower the LD entry cost please (part 1)?
>>> Resent-From: Linked Data community <public-lod@w3.org>
>>> Resent-Date: Sun, 08 Feb 2009 14:57:12 +0000
>>> 
>>> Hi Andraz,
>>> 
>>> I disagree, those two goals are not completely different in a sense
>> that
>>> different groups should address it separately. I had a delighting
>>> conversation with Andreas Harth of SWSE about that a week ago in
>> Berlin.
>>> Search Engines can't clean up other people's mess. It's even harmful
>> if
>>> they try. Data providers need incentives to provide clean data. See
>> the
>>> Google example: Google started indexing the web, and the webpages
>> with
>>> clean markup and site structure showed up in their search. And
>> Google's
>>> search provided real benefit to end-users.
>>> 
>>> Hence web publishers started to do SEO (search engine optimization),
>> so
>>> that their stuff shows up in Google as well (or ranked higher). If
> we
>>> don't reward the Linked Data publishers who provide clean data and
>>> penalize those who don't, there will never be an incentive to do it
>>> right.
>>> 
>>> Cheers,
>>> Georgi
>>> 
>>> --
>>> Georgi Kobilarov
>>> Freie Universität Berlin
>>> www.georgikobilarov.com
>>> 
>>>> -----Original Message-----
>>>> From: public-lod-request@w3.org [mailto:public-lod-request@w3.org]
>> On
>>>> Behalf Of Andraz Tori
>>>> Sent: Saturday, February 07, 2009 4:02 PM
>>>> To: Hugh Glaser
>>>> Cc: public-lod@w3.org
>>>> Subject: Re: Can we lower the LD entry cost please (part 1)?
>>>> 
>>>> 
>>>> Hi Hugh,
>>>> 
>>>> I think you are mixing two completely different goals.
>>>> 
>>>> Why can't one set of people provide the data while the other set of
>>>> people provide search technologies over that data?
>>>> 
>>>> It takes two completely different technologies, processes, etc.
>>>> 
>>>> BTW: an easy way to search is also to write meaningful sentence  or
>>>> paragraph (using the phrase/entity/concept) and put it into Zemanta
>> or
>>>> Calais. You will usually get properly disambiguated URIs back.
>>>> 
>>>> bye
>>>> andraz
>>>> 
>>>> On Sat, 2009-02-07 at 13:23 +0000, Hugh Glaser wrote:
>>>>> My proposal:
>>>>> *We should not permit any site to be a member of the Linked Data
>>>> cloud if it
>>>>> does not provide a simple way of finding URIs from natural
> language
>>>>> identifiers.*
>>>>> 
>>>>> Rationale:
>>>>> One aspect of our Linking Data (not to mention our Linking Open
>>> Data)
>>>> world
>>>>> is that we want people to link to our data - that is, I have
>>>> published some
>>>>> stuff about something, with a URI, and I want people to be able to
>>>> use that
>>>>> URI.
>>>>> 
>>>>> So my question to you, the publisher, is: "How easy is it for me
> to
>>>> find the
>>>>> URI your users want?"
>>>>> 
>>>>> My experience suggests it is not always very easy.
>>>>> What is required at the minimum, I suggest, is a text search, so
>>> that
>>>> if I
>>>>> have a (boring string version of a) name that refers in my mind to
>>>>> something, I can hope to find an (exciting Linked Data) URI of
> that
>>>> thing.
>>>>> I call this a projection from the Web to the Semantic Web.
>>>>> rdfs:label or equivalent usually provides the other one.
>>>>> 
>>>>> At the risk of being seen as critical of the amazing efforts of
> all
>>>> my
>>>>> colleagues (if not also myself), this is rarely an easy thing to
>> do.
>>>>> 
>>>>> Some recent experiences:
>>>>> OpenCalais: as in my previous message on this list, I tried hard
> to
>>>> find a
>>>>> URI for Tim, but failed.
>>>>> dbtune: Saw a Twine message about dbtune, trundled over there, and
>>>> tried to
>>>>> find a URI for a Telemann, but failed.
>>>>> dbpedia: wanted Tim again. After clicking on a few web pages, none
>>> of
>>>> which
>>>>> seemed to provide a search facility, I resorted to my usual
>> method:-
>>>> look it
>>>>> up in wikipedia and then hack the URI and hope it works in
> dbpedia.
>>>>> (Sorry to name specific sites, guys, but I needed a few examples.
>>>>> And I am only asking for a little more, so that the fruits of your
>>>> amazing
>>>>> labours can be more widely appreciated!)
>>>>> wordnet: [2] below
>>>>> 
>>>>> So I have access to Linked Data sites that I know (or at least
>>>> strongly
>>>>> suspect) have URIs I might want, but I can't find them.
>>>>> How on earth do we expect your average punter to join this world?
>>>>> 
>>>>> What have I missed?
>>>>> Searching, such as Sindice: Well yes, but should I really have to
>> go
>>>> off to
>>>>> a search engine to find a dbpedia URI? And when I look up
> "Telemann
>>>> dbtune"
>>>>> I don't get any results. And I wanted the dbtune link, not some
>>> other
>>>> link.
>>>>> Did I miss some links on web pages? Quite probably, but the basic
>>>> problem
>>>>> still stands.
>>>>> SPARQL: Well, yes. But we cannot seriously expect our users to
>>>> formulate a
>>>>> SPARQL query simply to find out the dbpedia URI for Tim. What is
>> the
>>>> regexp
>>>>> I need to put in? (see below [1])
>>>>> A foaf file: Well Tim's dbpedia URI is probably in his foaf file
>>>> (although
>>>>> possibly there are none of Tim's URIs in his foaf file), if I can
>>>> actually
>>>>> find the file; but for some reason I can't seem to find Telemann's
>>>> foaf
>>>>> file.
>>>>> 
>>>>> If you are still doubting me, try finding a URI for Telemann in
>>>> dbpedia
>>>>> without using an external link, just by following stuff from the
>>> home
>>>> page.
>>>>> I managed to get a Telemann by using SPARQL without a regexp (it
>>>> times out
>>>>> on any regexp), but unfortunately I get the asteroid.
>>>>> 
>>>>> Again, my proposal:
>>>>> *We should not permit any site to be a member of the Linked Data
>>>> cloud if it
>>>>> does not provide a simple way of finding URIs from natural
> language
>>>>> identifiers.*
>>>>> Otherwise we end up in a silo, and the world passes us by.
>>>>> 
>>>>> Very best
>>>>> Hugh
>>>>> 
>>>>> [And since we have to take our own medicine, I have added a "Just
>>>> search"
>>>>> box right at the top level of all the rkbexplorer.com domains,
> such
>>>> as
>>>>> http://wordnet.rkbexplorer.com/ ]
>>>>> 
>>>>> 
>>>>> [1]
>>>>> Dbtune finding of Telemann:
>>>>> SELECT * WHERE {?s ?p ?name .
>>>>> FILTER regex(?name, "Telemann$") }
>>>>> 
>>>>> I tried
>>>>> SELECT * WHERE {?s ?p ?name .
>>>>> FILTER regex(?name, "telemann$", "i") }
>>>>> first, but got no results - not sure why.
>>>>> 
>>>>> [2]
>>>>> <rant>
>>>>> I cannot believe just how frustrating this stuff can be when you
>>>> really try
>>>>> to use it.
>>>>> Because I looked at Sindice for telemann, I know that it is a word
>>> in
>>>>> wordnet ( http://sindice.com/search?q=Telemann reports loads of
>>>>> http://wordnet.rkbexplorer.com/ links).
>>>>> Great, he thinks, I can get a wordnet link from a "proper" wordnet
>>>> publisher
>>>>> (ie not me).
>>>>> Goes to
>>>>> 
>>>> 
>>> 
>> 
> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpen
>>>> Data
>>>>> to find wordnet.
>>>>> The link there is dead.
>>>>> Strips off the last bit, to get to the home princeton wordnet
> page,
>>>> and
>>>>> clicks on the browser link I find - also dead.
>>>>> Go back and look on the
>>>>> 
>>>> 
>>> 
>> 
> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/Da
>>>> taSet
>>>>> s page, and find the link to http://esw.w3.org/topic/WordNet , but
>>>> that
>>>>> doesn't help.
>>>>> So finally, I do the obvious - google "wordnet rdf".
>>>>> Of course I get lots of pages saying how available it is, and how
>>>> exciting
>>>>> it is that we have it, and how it was produced; and somewhere in
>>>> there I
>>>>> find a link: "Wordnet-RDF/RDDL Browser" at
>>>> www.openhealth.org/RDDL/wnbrowse
>>>>> Almost unable to contain myself with excitement, I click on the
>> link
>>>> to find
>>>>> a text box, and with trembling hands I type "Telemann" and click
>>>> submit.
>>>>> If I show you what I got, you can come some way to imagining my
>>>> devastation:
>>>>> "Using org.apache.xerces.parsers.SAXParser
>>>>> Exception net.sf.saxon.trans.DynamicError:
>>>> org.xml.sax.SAXParseException:
>>>>> White spaces are required between publicId and systemId.
>>>>> org.xml.sax.SAXParseException: White spaces are required between
>>>> publicId
>>>>> and systemId."
>>>>> 
>>>>> Does the emperor have any clothes at all?
>>>>> </rant>
>>>>> 
>>>>> 
>>>> --
>>>> Andraz Tori, CTO
>>>> Zemanta Ltd, London, Ljubljana
>>>> www.zemanta.com
>>>> mail: andraz@zemanta.com
>>>> tel: +386 41 515 767
>>>> twitter: andraz, skype: minmax_test
>>>> 
>>>> 
>>>> 
>>> 

Received on Sunday, 8 February 2009 17:15:31 UTC