- From: Andraz Tori <andraz@zemanta.com>
- Date: Sat, 07 Feb 2009 16:02:16 +0100
- To: Hugh Glaser <hg@ecs.soton.ac.uk>
- Cc: "public-lod@w3.org" <public-lod@w3.org>
Hi Hugh, I think you are mixing two completely different goals. Why can't one set of people provide the data while the other set of people provide search technologies over that data? It takes two completely different technologies, processes, etc. BTW: an easy way to search is also to write meaningful sentence or paragraph (using the phrase/entity/concept) and put it into Zemanta or Calais. You will usually get properly disambiguated URIs back. bye andraz On Sat, 2009-02-07 at 13:23 +0000, Hugh Glaser wrote: > My proposal: > *We should not permit any site to be a member of the Linked Data cloud if it > does not provide a simple way of finding URIs from natural language > identifiers.* > > Rationale: > One aspect of our Linking Data (not to mention our Linking Open Data) world > is that we want people to link to our data - that is, I have published some > stuff about something, with a URI, and I want people to be able to use that > URI. > > So my question to you, the publisher, is: "How easy is it for me to find the > URI your users want?" > > My experience suggests it is not always very easy. > What is required at the minimum, I suggest, is a text search, so that if I > have a (boring string version of a) name that refers in my mind to > something, I can hope to find an (exciting Linked Data) URI of that thing. > I call this a projection from the Web to the Semantic Web. > rdfs:label or equivalent usually provides the other one. > > At the risk of being seen as critical of the amazing efforts of all my > colleagues (if not also myself), this is rarely an easy thing to do. > > Some recent experiences: > OpenCalais: as in my previous message on this list, I tried hard to find a > URI for Tim, but failed. > dbtune: Saw a Twine message about dbtune, trundled over there, and tried to > find a URI for a Telemann, but failed. > dbpedia: wanted Tim again. After clicking on a few web pages, none of which > seemed to provide a search facility, I resorted to my usual method:- look it > up in wikipedia and then hack the URI and hope it works in dbpedia. > (Sorry to name specific sites, guys, but I needed a few examples. > And I am only asking for a little more, so that the fruits of your amazing > labours can be more widely appreciated!) > wordnet: [2] below > > So I have access to Linked Data sites that I know (or at least strongly > suspect) have URIs I might want, but I can't find them. > How on earth do we expect your average punter to join this world? > > What have I missed? > Searching, such as Sindice: Well yes, but should I really have to go off to > a search engine to find a dbpedia URI? And when I look up "Telemann dbtune" > I don't get any results. And I wanted the dbtune link, not some other link. > Did I miss some links on web pages? Quite probably, but the basic problem > still stands. > SPARQL: Well, yes. But we cannot seriously expect our users to formulate a > SPARQL query simply to find out the dbpedia URI for Tim. What is the regexp > I need to put in? (see below [1]) > A foaf file: Well Tim's dbpedia URI is probably in his foaf file (although > possibly there are none of Tim's URIs in his foaf file), if I can actually > find the file; but for some reason I can't seem to find Telemann's foaf > file. > > If you are still doubting me, try finding a URI for Telemann in dbpedia > without using an external link, just by following stuff from the home page. > I managed to get a Telemann by using SPARQL without a regexp (it times out > on any regexp), but unfortunately I get the asteroid. > > Again, my proposal: > *We should not permit any site to be a member of the Linked Data cloud if it > does not provide a simple way of finding URIs from natural language > identifiers.* > Otherwise we end up in a silo, and the world passes us by. > > Very best > Hugh > > [And since we have to take our own medicine, I have added a "Just search" > box right at the top level of all the rkbexplorer.com domains, such as > http://wordnet.rkbexplorer.com/ ] > > > [1] > Dbtune finding of Telemann: > SELECT * WHERE {?s ?p ?name . > FILTER regex(?name, "Telemann$") } > > I tried > SELECT * WHERE {?s ?p ?name . > FILTER regex(?name, "telemann$", "i") } > first, but got no results - not sure why. > > [2] > <rant> > I cannot believe just how frustrating this stuff can be when you really try > to use it. > Because I looked at Sindice for telemann, I know that it is a word in > wordnet ( http://sindice.com/search?q=Telemann reports loads of > http://wordnet.rkbexplorer.com/ links). > Great, he thinks, I can get a wordnet link from a "proper" wordnet publisher > (ie not me). > Goes to > http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData > to find wordnet. > The link there is dead. > Strips off the last bit, to get to the home princeton wordnet page, and > clicks on the browser link I find - also dead. > Go back and look on the > http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSet > s page, and find the link to http://esw.w3.org/topic/WordNet , but that > doesn't help. > So finally, I do the obvious - google "wordnet rdf". > Of course I get lots of pages saying how available it is, and how exciting > it is that we have it, and how it was produced; and somewhere in there I > find a link: "Wordnet-RDF/RDDL Browser" at www.openhealth.org/RDDL/wnbrowse > Almost unable to contain myself with excitement, I click on the link to find > a text box, and with trembling hands I type "Telemann" and click submit. > If I show you what I got, you can come some way to imagining my devastation: > "Using org.apache.xerces.parsers.SAXParser > Exception net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException: > White spaces are required between publicId and systemId. > org.xml.sax.SAXParseException: White spaces are required between publicId > and systemId." > > Does the emperor have any clothes at all? > </rant> > > -- Andraz Tori, CTO Zemanta Ltd, London, Ljubljana www.zemanta.com mail: andraz@zemanta.com tel: +386 41 515 767 twitter: andraz, skype: minmax_test
Received on Saturday, 7 February 2009 15:02:54 UTC