Can we lower the LD entry cost please (part 1)?

My proposal:
*We should not permit any site to be a member of the Linked Data cloud if it
does not provide a simple way of finding URIs from natural language
identifiers.*

Rationale:
One aspect of our Linking Data (not to mention our Linking Open Data) world
is that we want people to link to our data - that is, I have published some
stuff about something, with a URI, and I want people to be able to use that
URI.

So my question to you, the publisher, is: "How easy is it for me to find the
URI your users want?"

My experience suggests it is not always very easy.
What is required at the minimum, I suggest, is a text search, so that if I
have a (boring string version of a) name that refers in my mind to
something, I can hope to find an (exciting Linked Data) URI of that thing.
I call this a projection from the Web to the Semantic Web.
rdfs:label or equivalent usually provides the other one.

At the risk of being seen as critical of the amazing efforts of all my
colleagues (if not also myself), this is rarely an easy thing to do.

Some recent experiences:
OpenCalais: as in my previous message on this list, I tried hard to find a
URI for Tim, but failed.
dbtune: Saw a Twine message about dbtune, trundled over there, and tried to
find a URI for a Telemann, but failed.
dbpedia: wanted Tim again. After clicking on a few web pages, none of which
seemed to provide a search facility, I resorted to my usual method:- look it
up in wikipedia and then hack the URI and hope it works in dbpedia.
(Sorry to name specific sites, guys, but I needed a few examples.
And I am only asking for a little more, so that the fruits of your amazing
labours can be more widely appreciated!)
wordnet: [2] below

So I have access to Linked Data sites that I know (or at least strongly
suspect) have URIs I might want, but I can't find them.
How on earth do we expect your average punter to join this world?

What have I missed?
Searching, such as Sindice: Well yes, but should I really have to go off to
a search engine to find a dbpedia URI? And when I look up "Telemann dbtune"
I don't get any results. And I wanted the dbtune link, not some other link.
Did I miss some links on web pages? Quite probably, but the basic problem
still stands.
SPARQL: Well, yes. But we cannot seriously expect our users to formulate a
SPARQL query simply to find out the dbpedia URI for Tim. What is the regexp
I need to put in? (see below [1])
A foaf file: Well Tim's dbpedia URI is probably in his foaf file (although
possibly there are none of Tim's URIs in his foaf file), if I can actually
find the file; but for some reason I can't seem to find Telemann's foaf
file.

If you are still doubting me, try finding a URI for Telemann in dbpedia
without using an external link, just by following stuff from the home page.
I managed to get a Telemann by using SPARQL without a regexp (it times out
on any regexp), but unfortunately I get the asteroid.

Again, my proposal:
*We should not permit any site to be a member of the Linked Data cloud if it
does not provide a simple way of finding URIs from natural language
identifiers.*
Otherwise we end up in a silo, and the world passes us by.

Very best
Hugh

[And since we have to take our own medicine, I have added a "Just search"
box right at the top level of all the rkbexplorer.com domains, such as
http://wordnet.rkbexplorer.com/ ]


[1]
Dbtune finding of Telemann:
SELECT * WHERE {?s ?p ?name .
FILTER regex(?name, "Telemann$") }

I tried
SELECT * WHERE {?s ?p ?name .
FILTER regex(?name, "telemann$", "i") }
first, but got no results - not sure why.

[2]
<rant>
I cannot believe just how frustrating this stuff can be when you really try
to use it.
Because I looked at Sindice for telemann, I know that it is a word in
wordnet ( http://sindice.com/search?q=Telemann reports loads of
http://wordnet.rkbexplorer.com/ links).
Great, he thinks, I can get a wordnet link from a "proper" wordnet publisher
(ie not me).
Goes to
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
to find wordnet.
The link there is dead.
Strips off the last bit, to get to the home princeton wordnet page, and
clicks on the browser link I find - also dead.
Go back and look on the
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSet
s page, and find the link to http://esw.w3.org/topic/WordNet , but that
doesn't help.
So finally, I do the obvious - google "wordnet rdf".
Of course I get lots of pages saying how available it is, and how exciting
it is that we have it, and how it was produced; and somewhere in there I
find a link: "Wordnet-RDF/RDDL Browser" at  www.openhealth.org/RDDL/wnbrowse
Almost unable to contain myself with excitement, I click on the link to find
a text box, and with trembling hands I type "Telemann" and click submit.
If I show you what I got, you can come some way to imagining my devastation:
"Using org.apache.xerces.parsers.SAXParser
Exception net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException:
White spaces are required between publicId and systemId.
org.xml.sax.SAXParseException: White spaces are required between publicId
and systemId."

Does the emperor have any clothes at all?
</rant>

Received on Saturday, 7 February 2009 13:24:38 UTC