Re: Can we lower the LD entry cost please (part 1)? from Yves Raimond on 2009-02-07 (public-lod@w3.org from February 2009)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Sat, 7 Feb 2009 13:39:08 +0000
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <82593ac00902070539u7d109096x3d15de256306ce37@mail.gmail.com>
Hello!

On Sat, Feb 7, 2009 at 1:23 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
>
> My proposal:
> *We should not permit any site to be a member of the Linked Data cloud if it
> does not provide a simple way of finding URIs from natural language
> identifiers.*
>
> Rationale:
> One aspect of our Linking Data (not to mention our Linking Open Data) world
> is that we want people to link to our data - that is, I have published some
> stuff about something, with a URI, and I want people to be able to use that
> URI.
>
> So my question to you, the publisher, is: "How easy is it for me to find the
> URI your users want?"
>
> My experience suggests it is not always very easy.
> What is required at the minimum, I suggest, is a text search, so that if I
> have a (boring string version of a) name that refers in my mind to
> something, I can hope to find an (exciting Linked Data) URI of that thing.
> I call this a projection from the Web to the Semantic Web.
> rdfs:label or equivalent usually provides the other one.
>
> At the risk of being seen as critical of the amazing efforts of all my
> colleagues (if not also myself), this is rarely an easy thing to do.
>
> Some recent experiences:
> OpenCalais: as in my previous message on this list, I tried hard to find a
> URI for Tim, but failed.
> dbtune: Saw a Twine message about dbtune, trundled over there, and tried to
> find a URI for a Telemann, but failed.
> dbpedia: wanted Tim again. After clicking on a few web pages, none of which
> seemed to provide a search facility, I resorted to my usual method:- look it
> up in wikipedia and then hack the URI and hope it works in dbpedia.
> (Sorry to name specific sites, guys, but I needed a few examples.
> And I am only asking for a little more, so that the fruits of your amazing
> labours can be more widely appreciated!)
> wordnet: [2] below
>
> So I have access to Linked Data sites that I know (or at least strongly
> suspect) have URIs I might want, but I can't find them.
> How on earth do we expect your average punter to join this world?
>
> What have I missed?
> Searching, such as Sindice: Well yes, but should I really have to go off to
> a search engine to find a dbpedia URI? And when I look up "Telemann dbtune"
> I don't get any results. And I wanted the dbtune link, not some other link.
> Did I miss some links on web pages? Quite probably, but the basic problem
> still stands.
> SPARQL: Well, yes. But we cannot seriously expect our users to formulate a
> SPARQL query simply to find out the dbpedia URI for Tim. What is the regexp
> I need to put in? (see below [1])
> A foaf file: Well Tim's dbpedia URI is probably in his foaf file (although
> possibly there are none of Tim's URIs in his foaf file), if I can actually
> find the file; but for some reason I can't seem to find Telemann's foaf
> file.
>
> If you are still doubting me, try finding a URI for Telemann in dbpedia
> without using an external link, just by following stuff from the home page.
> I managed to get a Telemann by using SPARQL without a regexp (it times out
> on any regexp), but unfortunately I get the asteroid.
>
> Again, my proposal:
> *We should not permit any site to be a member of the Linked Data cloud if it
> does not provide a simple way of finding URIs from natural language
> identifiers.*
> Otherwise we end up in a silo, and the world passes us by.
>


I think this is a really dangerous idea. Most "web-scale" identifiers,
eg Musicbrainz GUIDs and BBC PIDs are not human readable (for a lot of
reasons, and mainly because human-readable identifiers are not unique
enough!!), but both provide really easy-to-use lookup service.
Such lookups, for other sites, can be provided by semantic web search
engines. It is exactly as in the document web: web identifiers are
mostly opaque, but search engines are here to provide the help needed.

So my proposal is: let's not confuse everything. Some people's job is
to make datasets available out there and as linked as possible to
others. Some other people make lookup services (eg Sindice), and I
think this separation of concerns works quite well.

Best,
y


> Very best
> Hugh
>
> [And since we have to take our own medicine, I have added a "Just search"
> box right at the top level of all the rkbexplorer.com domains, such as
> http://wordnet.rkbexplorer.com/ ]
>
>
> [1]
> Dbtune finding of Telemann:
> SELECT * WHERE {?s ?p ?name .
> FILTER regex(?name, "Telemann$") }
>
> I tried
> SELECT * WHERE {?s ?p ?name .
> FILTER regex(?name, "telemann$", "i") }
> first, but got no results - not sure why.
>
> [2]
> <rant>
> I cannot believe just how frustrating this stuff can be when you really try
> to use it.
> Because I looked at Sindice for telemann, I know that it is a word in
> wordnet ( http://sindice.com/search?q=Telemann reports loads of
> http://wordnet.rkbexplorer.com/ links).
> Great, he thinks, I can get a wordnet link from a "proper" wordnet publisher
> (ie not me).
> Goes to
> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
> to find wordnet.
> The link there is dead.
> Strips off the last bit, to get to the home princeton wordnet page, and
> clicks on the browser link I find - also dead.
> Go back and look on the
> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSet
> s page, and find the link to http://esw.w3.org/topic/WordNet , but that
> doesn't help.
> So finally, I do the obvious - google "wordnet rdf".
> Of course I get lots of pages saying how available it is, and how exciting
> it is that we have it, and how it was produced; and somewhere in there I
> find a link: "Wordnet-RDF/RDDL Browser" at  www.openhealth.org/RDDL/wnbrowse
> Almost unable to contain myself with excitement, I click on the link to find
> a text box, and with trembling hands I type "Telemann" and click submit.
> If I show you what I got, you can come some way to imagining my devastation:
> "Using org.apache.xerces.parsers.SAXParser
> Exception net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException:
> White spaces are required between publicId and systemId.
> org.xml.sax.SAXParseException: White spaces are required between publicId
> and systemId."
>
> Does the emperor have any clothes at all?
> </rant>
>
>
>
Received on Saturday, 7 February 2009 13:39:49 UTC