RE: Can we lower the LD entry cost please (part 1)? from Andraz Tori on 2009-02-08 (public-lod@w3.org from February 2009)

From: Andraz Tori <andraz@zemanta.com>
Date: Sun, 08 Feb 2009 16:50:20 +0100
To: Georgi Kobilarov <georgi.kobilarov@gmx.de>
Cc: Hugh Glaser <hg@ecs.soton.ac.uk>, public-lod@w3.org
Message-Id: <1234108220.6684.66.camel@minmax-laptop>
On Sun, 2009-02-08 at 15:56 +0100, Georgi Kobilarov wrote:
> Hi Andraz,
> 
> I disagree, those two goals are not completely different in a sense that
> different groups should address it separately. I had a delighting
> conversation with Andreas Harth of SWSE about that a week ago in Berlin.
> Search Engines can't clean up other people's mess. It's even harmful if
> they try. Data providers need incentives to provide clean data. See the
> Google example: Google started indexing the web, and the webpages with
> clean markup and site structure showed up in their search. And Google's
> search provided real benefit to end-users. 

Oh, I agree it's good for the web that publishers provide as clean data
as possible. What I am saying is that publishers have the data and might
be convinced to provide it, but requiring them to also provide advanced
search technology is contraproductive argument.

On the other hand, the major problem of semantic web is lack of
_incentives_ for publishers to publish data in clean semantic form.
I am working on one of the initiatives to change that and it will
hopefully see light of the day soon.

> Hence web publishers started to do SEO (search engine optimization), so
> that their stuff shows up in Google as well (or ranked higher). If we
> don't reward the Linked Data publishers who provide clean data and
> penalize those who don't, there will never be an incentive to do it
> right.


Yes exactly, I 100% agree. If you want publishers to provide good data,
provide incentives for them to do so. (most) Publishers only care about
traffic or better ad targeting, so make sure they get one or another. I
am not seeing many initiatives in that direction.
Instead of putting requirements on the publishers we should be working
on creating incentives for them. And demanding that google rewards them
won't work. Semweb community needs to create its own way of rewarding
them.


bye
andraz

> Cheers,
> Georgi
> 
> --
> Georgi Kobilarov
> Freie Universität Berlin
> www.georgikobilarov.com
> 
> > -----Original Message-----
> > From: public-lod-request@w3.org [mailto:public-lod-request@w3.org] On
> > Behalf Of Andraz Tori
> > Sent: Saturday, February 07, 2009 4:02 PM
> > To: Hugh Glaser
> > Cc: public-lod@w3.org
> > Subject: Re: Can we lower the LD entry cost please (part 1)?
> > 
> > 
> > Hi Hugh,
> > 
> > I think you are mixing two completely different goals.
> > 
> > Why can't one set of people provide the data while the other set of
> > people provide search technologies over that data?
> > 
> > It takes two completely different technologies, processes, etc.
> > 
> > BTW: an easy way to search is also to write meaningful sentence  or
> > paragraph (using the phrase/entity/concept) and put it into Zemanta or
> > Calais. You will usually get properly disambiguated URIs back.
> > 
> > bye
> > andraz
> > 
> > On Sat, 2009-02-07 at 13:23 +0000, Hugh Glaser wrote:
> > > My proposal:
> > > *We should not permit any site to be a member of the Linked Data
> > cloud if it
> > > does not provide a simple way of finding URIs from natural language
> > > identifiers.*
> > >
> > > Rationale:
> > > One aspect of our Linking Data (not to mention our Linking Open
> Data)
> > world
> > > is that we want people to link to our data - that is, I have
> > published some
> > > stuff about something, with a URI, and I want people to be able to
> > use that
> > > URI.
> > >
> > > So my question to you, the publisher, is: "How easy is it for me to
> > find the
> > > URI your users want?"
> > >
> > > My experience suggests it is not always very easy.
> > > What is required at the minimum, I suggest, is a text search, so
> that
> > if I
> > > have a (boring string version of a) name that refers in my mind to
> > > something, I can hope to find an (exciting Linked Data) URI of that
> > thing.
> > > I call this a projection from the Web to the Semantic Web.
> > > rdfs:label or equivalent usually provides the other one.
> > >
> > > At the risk of being seen as critical of the amazing efforts of all
> > my
> > > colleagues (if not also myself), this is rarely an easy thing to do.
> > >
> > > Some recent experiences:
> > > OpenCalais: as in my previous message on this list, I tried hard to
> > find a
> > > URI for Tim, but failed.
> > > dbtune: Saw a Twine message about dbtune, trundled over there, and
> > tried to
> > > find a URI for a Telemann, but failed.
> > > dbpedia: wanted Tim again. After clicking on a few web pages, none
> of
> > which
> > > seemed to provide a search facility, I resorted to my usual method:-
> > look it
> > > up in wikipedia and then hack the URI and hope it works in dbpedia.
> > > (Sorry to name specific sites, guys, but I needed a few examples.
> > > And I am only asking for a little more, so that the fruits of your
> > amazing
> > > labours can be more widely appreciated!)
> > > wordnet: [2] below
> > >
> > > So I have access to Linked Data sites that I know (or at least
> > strongly
> > > suspect) have URIs I might want, but I can't find them.
> > > How on earth do we expect your average punter to join this world?
> > >
> > > What have I missed?
> > > Searching, such as Sindice: Well yes, but should I really have to go
> > off to
> > > a search engine to find a dbpedia URI? And when I look up "Telemann
> > dbtune"
> > > I don't get any results. And I wanted the dbtune link, not some
> other
> > link.
> > > Did I miss some links on web pages? Quite probably, but the basic
> > problem
> > > still stands.
> > > SPARQL: Well, yes. But we cannot seriously expect our users to
> > formulate a
> > > SPARQL query simply to find out the dbpedia URI for Tim. What is the
> > regexp
> > > I need to put in? (see below [1])
> > > A foaf file: Well Tim's dbpedia URI is probably in his foaf file
> > (although
> > > possibly there are none of Tim's URIs in his foaf file), if I can
> > actually
> > > find the file; but for some reason I can't seem to find Telemann's
> > foaf
> > > file.
> > >
> > > If you are still doubting me, try finding a URI for Telemann in
> > dbpedia
> > > without using an external link, just by following stuff from the
> home
> > page.
> > > I managed to get a Telemann by using SPARQL without a regexp (it
> > times out
> > > on any regexp), but unfortunately I get the asteroid.
> > >
> > > Again, my proposal:
> > > *We should not permit any site to be a member of the Linked Data
> > cloud if it
> > > does not provide a simple way of finding URIs from natural language
> > > identifiers.*
> > > Otherwise we end up in a silo, and the world passes us by.
> > >
> > > Very best
> > > Hugh
> > >
> > > [And since we have to take our own medicine, I have added a "Just
> > search"
> > > box right at the top level of all the rkbexplorer.com domains, such
> > as
> > > http://wordnet.rkbexplorer.com/ ]
> > >
> > >
> > > [1]
> > > Dbtune finding of Telemann:
> > > SELECT * WHERE {?s ?p ?name .
> > > FILTER regex(?name, "Telemann$") }
> > >
> > > I tried
> > > SELECT * WHERE {?s ?p ?name .
> > > FILTER regex(?name, "telemann$", "i") }
> > > first, but got no results - not sure why.
> > >
> > > [2]
> > > <rant>
> > > I cannot believe just how frustrating this stuff can be when you
> > really try
> > > to use it.
> > > Because I looked at Sindice for telemann, I know that it is a word
> in
> > > wordnet ( http://sindice.com/search?q=Telemann reports loads of
> > > http://wordnet.rkbexplorer.com/ links).
> > > Great, he thinks, I can get a wordnet link from a "proper" wordnet
> > publisher
> > > (ie not me).
> > > Goes to
> > >
> >
> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpen
> > Data
> > > to find wordnet.
> > > The link there is dead.
> > > Strips off the last bit, to get to the home princeton wordnet page,
> > and
> > > clicks on the browser link I find - also dead.
> > > Go back and look on the
> > >
> >
> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/Da
> > taSet
> > > s page, and find the link to http://esw.w3.org/topic/WordNet , but
> > that
> > > doesn't help.
> > > So finally, I do the obvious - google "wordnet rdf".
> > > Of course I get lots of pages saying how available it is, and how
> > exciting
> > > it is that we have it, and how it was produced; and somewhere in
> > there I
> > > find a link: "Wordnet-RDF/RDDL Browser" at
> > www.openhealth.org/RDDL/wnbrowse
> > > Almost unable to contain myself with excitement, I click on the link
> > to find
> > > a text box, and with trembling hands I type "Telemann" and click
> > submit.
> > > If I show you what I got, you can come some way to imagining my
> > devastation:
> > > "Using org.apache.xerces.parsers.SAXParser
> > > Exception net.sf.saxon.trans.DynamicError:
> > org.xml.sax.SAXParseException:
> > > White spaces are required between publicId and systemId.
> > > org.xml.sax.SAXParseException: White spaces are required between
> > publicId
> > > and systemId."
> > >
> > > Does the emperor have any clothes at all?
> > > </rant>
> > >
> > >
> > --
> > Andraz Tori, CTO
> > Zemanta Ltd, London, Ljubljana
> > www.zemanta.com
> > mail: andraz@zemanta.com
> > tel: +386 41 515 767
> > twitter: andraz, skype: minmax_test
> > 
> > 
> > 
> 
-- 
Andraz Tori, CTO
Zemanta Ltd, London, Ljubljana
www.zemanta.com
mail: andraz@zemanta.com
tel: +386 41 515 767
twitter: andraz, skype: minmax_test
Received on Sunday, 8 February 2009 15:51:01 UTC