Re: Can we lower the LD entry cost please (part 1)? from Giovanni Tummarello on 2009-02-07 (public-lod@w3.org from February 2009)

From: Giovanni Tummarello <g.tummarello@gmail.com>
Date: Sat, 7 Feb 2009 21:04:48 +0000
To: Yves Raimond <yves.raimond@gmail.com>
Cc: Hugh Glaser <hg@ecs.soton.ac.uk>, "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <210271540902071304p2076ad61u455397ba08cafa68@mail.gmail.com>
Yves,
just on the side, yes there is not much dbtune in sindice.  just a few
http://sindice.com/search?q=dbtune&qt=term

if you have an RDF dump of the site or of part of it and you express
it in a semantic sitemap you would be indexed full in very short time
. Otherwise we should have the ne crawler taking service in a few days
and that should make a notable  difference.

thanks
Giovanni

On Sat, Feb 7, 2009 at 1:39 PM, Yves Raimond <yves.raimond@gmail.com> wrote:
>
> Hello!
>
> On Sat, Feb 7, 2009 at 1:23 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
>>
>> My proposal:
>> *We should not permit any site to be a member of the Linked Data cloud if it
>> does not provide a simple way of finding URIs from natural language
>> identifiers.*
>>
>> Rationale:
>> One aspect of our Linking Data (not to mention our Linking Open Data) world
>> is that we want people to link to our data - that is, I have published some
>> stuff about something, with a URI, and I want people to be able to use that
>> URI.
>>
>> So my question to you, the publisher, is: "How easy is it for me to find the
>> URI your users want?"
>>
>> My experience suggests it is not always very easy.
>> What is required at the minimum, I suggest, is a text search, so that if I
>> have a (boring string version of a) name that refers in my mind to
>> something, I can hope to find an (exciting Linked Data) URI of that thing.
>> I call this a projection from the Web to the Semantic Web.
>> rdfs:label or equivalent usually provides the other one.
>>
>> At the risk of being seen as critical of the amazing efforts of all my
>> colleagues (if not also myself), this is rarely an easy thing to do.
>>
>> Some recent experiences:
>> OpenCalais: as in my previous message on this list, I tried hard to find a
>> URI for Tim, but failed.
>> dbtune: Saw a Twine message about dbtune, trundled over there, and tried to
>> find a URI for a Telemann, but failed.
>> dbpedia: wanted Tim again. After clicking on a few web pages, none of which
>> seemed to provide a search facility, I resorted to my usual method:- look it
>> up in wikipedia and then hack the URI and hope it works in dbpedia.
>> (Sorry to name specific sites, guys, but I needed a few examples.
>> And I am only asking for a little more, so that the fruits of your amazing
>> labours can be more widely appreciated!)
>> wordnet: [2] below
>>
>> So I have access to Linked Data sites that I know (or at least strongly
>> suspect) have URIs I might want, but I can't find them.
>> How on earth do we expect your average punter to join this world?
>>
>> What have I missed?
>> Searching, such as Sindice: Well yes, but should I really have to go off to
>> a search engine to find a dbpedia URI? And when I look up "Telemann dbtune"
>> I don't get any results. And I wanted the dbtune link, not some other link.
>> Did I miss some links on web pages? Quite probably, but the basic problem
>> still stands.
>> SPARQL: Well, yes. But we cannot seriously expect our users to formulate a
>> SPARQL query simply to find out the dbpedia URI for Tim. What is the regexp
>> I need to put in? (see below [1])
>> A foaf file: Well Tim's dbpedia URI is probably in his foaf file (although
>> possibly there are none of Tim's URIs in his foaf file), if I can actually
>> find the file; but for some reason I can't seem to find Telemann's foaf
>> file.
>>
>> If you are still doubting me, try finding a URI for Telemann in dbpedia
>> without using an external link, just by following stuff from the home page.
>> I managed to get a Telemann by using SPARQL without a regexp (it times out
>> on any regexp), but unfortunately I get the asteroid.
>>
>> Again, my proposal:
>> *We should not permit any site to be a member of the Linked Data cloud if it
>> does not provide a simple way of finding URIs from natural language
>> identifiers.*
>> Otherwise we end up in a silo, and the world passes us by.
>>
>
>
> I think this is a really dangerous idea. Most "web-scale" identifiers,
> eg Musicbrainz GUIDs and BBC PIDs are not human readable (for a lot of
> reasons, and mainly because human-readable identifiers are not unique
> enough!!), but both provide really easy-to-use lookup service.
> Such lookups, for other sites, can be provided by semantic web search
> engines. It is exactly as in the document web: web identifiers are
> mostly opaque, but search engines are here to provide the help needed.
>
> So my proposal is: let's not confuse everything. Some people's job is
> to make datasets available out there and as linked as possible to
> others. Some other people make lookup services (eg Sindice), and I
> think this separation of concerns works quite well.
>
> Best,
> y
>
>
>> Very best
>> Hugh
>>
>> [And since we have to take our own medicine, I have added a "Just search"
>> box right at the top level of all the rkbexplorer.com domains, such as
>> http://wordnet.rkbexplorer.com/ ]
>>
>>
>> [1]
>> Dbtune finding of Telemann:
>> SELECT * WHERE {?s ?p ?name .
>> FILTER regex(?name, "Telemann$") }
>>
>> I tried
>> SELECT * WHERE {?s ?p ?name .
>> FILTER regex(?name, "telemann$", "i") }
>> first, but got no results - not sure why.
>>
>> [2]
>> <rant>
>> I cannot believe just how frustrating this stuff can be when you really try
>> to use it.
>> Because I looked at Sindice for telemann, I know that it is a word in
>> wordnet ( http://sindice.com/search?q=Telemann reports loads of
>> http://wordnet.rkbexplorer.com/ links).
>> Great, he thinks, I can get a wordnet link from a "proper" wordnet publisher
>> (ie not me).
>> Goes to
>> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
>> to find wordnet.
>> The link there is dead.
>> Strips off the last bit, to get to the home princeton wordnet page, and
>> clicks on the browser link I find - also dead.
>> Go back and look on the
>> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSet
>> s page, and find the link to http://esw.w3.org/topic/WordNet , but that
>> doesn't help.
>> So finally, I do the obvious - google "wordnet rdf".
>> Of course I get lots of pages saying how available it is, and how exciting
>> it is that we have it, and how it was produced; and somewhere in there I
>> find a link: "Wordnet-RDF/RDDL Browser" at  www.openhealth.org/RDDL/wnbrowse
>> Almost unable to contain myself with excitement, I click on the link to find
>> a text box, and with trembling hands I type "Telemann" and click submit.
>> If I show you what I got, you can come some way to imagining my devastation:
>> "Using org.apache.xerces.parsers.SAXParser
>> Exception net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException:
>> White spaces are required between publicId and systemId.
>> org.xml.sax.SAXParseException: White spaces are required between publicId
>> and systemId."
>>
>> Does the emperor have any clothes at all?
>> </rant>
>>
>>
>>
>
>
Received on Saturday, 7 February 2009 21:05:31 UTC