Re: Can we lower the LD entry cost please (part 1)? from Hugh Glaser on 2009-02-09 (public-lod@w3.org from February 2009)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Mon, 9 Feb 2009 11:33:34 +0000
To: Richard Cyganiak <richard@cyganiak.de>
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <EMEW,l18BXmc6824b010a394f293fff8f324f37e31e,hg%ecs.soton.ac.uk,C5B5C50E.2946C%hg@ecs.soton.ac.uk>
Thanks Richard.
Just a few comments.
Firstly, fixing the problem: I hope that for most datasets, as you suggest,
it is not hard to produce something useful, and there are many that do.
And I now know about more, which is great (thanks Georgi and Kingsley)!
Secondly documenting: good idea. As you can tell, I have been trying to do
linking from our datasets to others, and found it pretty frustrating.
[Yesterday's effort was on geonames, and even that was not straightforward,
although I got there in the end.]
One problem is that the esw wiki main page usually links to the home pages
of the main site most of the time, which often does not even mention LD or
SW.
Of course, we don't want to pollute our beautiful LD-enabled sites by
exposing the LD directly - for me that is the whole point: the user of the
site should be blissfully unaware of the technology delivering it.
So new arrivals (and clearly me!) head off to the data sources we proudly
list, and vanish.
There is actually a page that has many of the links - the datasets page.
Most of the links there take you much closer to finding that elusive URI.
Of course there is a disconnect between these two pages, which is that not
all bubbles are listed, but that is sort of another issue.
But even from these links it is not always straightforward - I have seen a
few messages saying "just go and look it up at x", and I invite everyone to
try doing that, and then get a friend to try doing it, before saying that.

With reference to your smirk - very fair :-)
I remember the message, and did try to act on it, and still have more to do.
But please do as I say, not as I do?
And I could feel and empathise with your frustration coming down to me
through the years :-)
But I can't resist some response, and actually it is germane to the
discussion.
Did you really expect "the", "a" and "2003" to give you something sensible?
I haven't tried "the" or "a" on dblp, but "2003" finally came back with 4884
results.
And yes, 3store does time out for the larger stores (it works for the
smaller) on your SPARQL still, but I have no control over that.

So this brings us to another point - will we simply be disappointing the
searchers because the facility is not good enough?
Possibly.
Some guidance on searching is probably good.
And in any case, it hopefully better than what we have.

Best
Hugh

On 09/02/2009 01:59, "Richard Cyganiak" <richard@cyganiak.de> wrote:

> Hugh,
>
> An important and interesting issue, thanks for raising it, and thanks
> also to everyone else who contributed to this thread.
>
> I tend to agree: A search function that allows looking for resources
> by name greatly increases the usefulness of any dataset, and providing
> such a function is always a good idea.
>
> Let me ask you something, Hugh: Now that you've raised awareness of
> the issue, can you propose some concrete steps that we could take to
> improve the situation? Shall we review the datasets out there and flag
> those without search? Shall we write up a blog post or wiki page?
> Something else?
>
> I want to point out that creating such a site search can be very
> simple for the dataset publisher. For example, at the old Berlin DBLP
> dataset [1], you will find a name search on the homepage. This was a
> last-minute hack, implemented in an hour using a pageful of
> Javascript. It works by asking a SPARQL query to the dataset's SPARQL
> endpoint via AJAX, and redirecting to the best result. Certainly not
> the best search function you've ever seen, but really simple... If
> your dataset wraps a triple store or a relational database or a web
> API, then you almost certainly can use the search functions provided
> by the store/DB/API to implement this, and I would be surprised if it
> takes more than half a day.
>
> Another example to which I've contributed, and which I like quite
> much, is the search of the RDF book mashup [2], which works by
> wrapping the appropriate method of the Amazon web service API. The
> search results are also available as RDF (find it via autodiscovery
> links).
>
> Bradley's mention of RDFa is worth highlighting: In an RDFa-enabled
> website, the local site search, which is probably already available,
> automatically doubles as a search for URIs. This is one of the many
> reasons why I'm becoming an RDFa fanboy -- it makes us create good
> linked data sites simply by following dusty old good practices for
> website design and deployment, such as providing site search!
>
> Finally, allow me to be a bit smirky and quote below from an email I
> sent to this list 14 months ago. In it, I recount similar frustrations
> in finding entry points into a recently announced dataset -- RKB
> Explorer. It's good to see that this site has improved a lot since,
> but it's maybe a bit discouraging that we still face the same general
> problems more than a year later... Anyways, enjoy! ;-)
>
> Next, let's talk about concrete steps that we can take to improve the
> situation.
>
> Best,
> Richard
>
> [1] http://www4.wiwiss.fu-berlin.de/dblp/
> [2] http://www4.wiwiss.fu-berlin.de/bizer/bookmashup/#search
>
>
> On 7 Nov 2007, at 21:46, Richard Cyganiak wrote:
>> Hugh,
>>
>> This looks like it could be an awesome resource. Unfortunately I
>> didn't have much luck getting any kind of data back from the services.
>>
>> The "browse" function doesn't do anything useful for me. I searched
>> for a wide variety of terms, including "the", "a" and "2003" in the
>> first ten or so datasets, including the one called Citeseer and
>> DBLP. No results. What am I supposed to put into the search box?
>>
>> I also tried to explore the datasets using SPARQL queries. I started
>> with queries such as
>>
>>   SELECT DISTINCT ?class WHERE { ?x a ?class }
>>
>> to learn about the vocabulary used in the dataset. These queries
>> return some results on some of the datasets (they time out on
>> others), but clicking any of the results consistently showed a page
>> with zero results. Same for opening in an RDF browser.
>>
>> So in fact, despite honestly trying, the only way I could get any
>> real data back from the services was by using the four example URIs
>> provided at www.rkbexplorer.com .
>>
>> Obviously a lot of work went into this. It's a shame that it's so
>> hard to make any use of it because the last 5% are missing.
>>
>> What are those last 5%?
>>
>> 1. A brief description of what each dataset actually is, and what
>> sort of data it contains. The currently available information (who
>> provided the data and some triple counts) are not enough.
>>
>> 2. A bunch of representative example URIs for each dataset.
>>
>> 3. A bunch of representative and interesting SPARQL queries against
>> each dataset.
>>
>> 4. If possible, a note on what vocabulary (classes and properties)
>> are used in each dataset. This would greatly simplify SPARQLing the
>> datasets.
>>
>> 5. You should think really hard about ³natural² navigation entry
>> points into the datasets. Is there any natural ³root² from which
>> everything can be accessed? Is there a category system or class
>> hierarchy that one can navigate along to find interesting stuff?
>>
>> 6. You should consider adding a few domain-specific search
>> functions, such as the simple ³Find Yourself² function provided at
>> http://dblp.l3s.de/d2r/
>>  .
>>
>> I'm a bit frustrated because this looks like an amazingly great
>> resource, but I can't actually get any clear feeling for its scope
>> or quality or contents. This feels like exploring a pitch black room
>> while wearing boxing gloves.
>>
>> I'm very hopeful that you can greatly improve this experience with
>> little effort.
>>
>> Thanks a lot,
>> Richard
>
>
>
>
>
>
> On 7 Feb 2009, at 13:23, Hugh Glaser wrote:
>
>>
>> My proposal:
>> *We should not permit any site to be a member of the Linked Data
>> cloud if it
>> does not provide a simple way of finding URIs from natural language
>> identifiers.*
>>
>> Rationale:
>> One aspect of our Linking Data (not to mention our Linking Open
>> Data) world
>> is that we want people to link to our data - that is, I have
>> published some
>> stuff about something, with a URI, and I want people to be able to
>> use that
>> URI.
>>
>> So my question to you, the publisher, is: "How easy is it for me to
>> find the
>> URI your users want?"
>>
>> My experience suggests it is not always very easy.
>> What is required at the minimum, I suggest, is a text search, so
>> that if I
>> have a (boring string version of a) name that refers in my mind to
>> something, I can hope to find an (exciting Linked Data) URI of that
>> thing.
>> I call this a projection from the Web to the Semantic Web.
>> rdfs:label or equivalent usually provides the other one.
>>
>> At the risk of being seen as critical of the amazing efforts of all my
>> colleagues (if not also myself), this is rarely an easy thing to do.
>>
>> Some recent experiences:
>> OpenCalais: as in my previous message on this list, I tried hard to
>> find a
>> URI for Tim, but failed.
>> dbtune: Saw a Twine message about dbtune, trundled over there, and
>> tried to
>> find a URI for a Telemann, but failed.
>> dbpedia: wanted Tim again. After clicking on a few web pages, none
>> of which
>> seemed to provide a search facility, I resorted to my usual method:-
>> look it
>> up in wikipedia and then hack the URI and hope it works in dbpedia.
>> (Sorry to name specific sites, guys, but I needed a few examples.
>> And I am only asking for a little more, so that the fruits of your
>> amazing
>> labours can be more widely appreciated!)
>> wordnet: [2] below
>>
>> So I have access to Linked Data sites that I know (or at least
>> strongly
>> suspect) have URIs I might want, but I can't find them.
>> How on earth do we expect your average punter to join this world?
>>
>> What have I missed?
>> Searching, such as Sindice: Well yes, but should I really have to go
>> off to
>> a search engine to find a dbpedia URI? And when I look up "Telemann
>> dbtune"
>> I don't get any results. And I wanted the dbtune link, not some
>> other link.
>> Did I miss some links on web pages? Quite probably, but the basic
>> problem
>> still stands.
>> SPARQL: Well, yes. But we cannot seriously expect our users to
>> formulate a
>> SPARQL query simply to find out the dbpedia URI for Tim. What is the
>> regexp
>> I need to put in? (see below [1])
>> A foaf file: Well Tim's dbpedia URI is probably in his foaf file
>> (although
>> possibly there are none of Tim's URIs in his foaf file), if I can
>> actually
>> find the file; but for some reason I can't seem to find Telemann's
>> foaf
>> file.
>>
>> If you are still doubting me, try finding a URI for Telemann in
>> dbpedia
>> without using an external link, just by following stuff from the
>> home page.
>> I managed to get a Telemann by using SPARQL without a regexp (it
>> times out
>> on any regexp), but unfortunately I get the asteroid.
>>
>> Again, my proposal:
>> *We should not permit any site to be a member of the Linked Data
>> cloud if it
>> does not provide a simple way of finding URIs from natural language
>> identifiers.*
>> Otherwise we end up in a silo, and the world passes us by.
>>
>> Very best
>> Hugh
>>
>> [And since we have to take our own medicine, I have added a "Just
>> search"
>> box right at the top level of all the rkbexplorer.com domains, such as
>> http://wordnet.rkbexplorer.com/ ]
>>
>>
>> [1]
>> Dbtune finding of Telemann:
>> SELECT * WHERE {?s ?p ?name .
>> FILTER regex(?name, "Telemann$") }
>>
>> I tried
>> SELECT * WHERE {?s ?p ?name .
>> FILTER regex(?name, "telemann$", "i") }
>> first, but got no results - not sure why.
>>
>> [2]
>> <rant>
>> I cannot believe just how frustrating this stuff can be when you
>> really try
>> to use it.
>> Because I looked at Sindice for telemann, I know that it is a word in
>> wordnet ( http://sindice.com/search?q=Telemann reports loads of
>> http://wordnet.rkbexplorer.com/ links).
>> Great, he thinks, I can get a wordnet link from a "proper" wordnet
>> publisher
>> (ie not me).
>> Goes to
>> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
>> to find wordnet.
>> The link there is dead.
>> Strips off the last bit, to get to the home princeton wordnet page,
>> and
>> clicks on the browser link I find - also dead.
>> Go back and look on the
>> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSet
>> s page, and find the link to http://esw.w3.org/topic/WordNet , but
>> that
>> doesn't help.
>> So finally, I do the obvious - google "wordnet rdf".
>> Of course I get lots of pages saying how available it is, and how
>> exciting
>> it is that we have it, and how it was produced; and somewhere in
>> there I
>> find a link: "Wordnet-RDF/RDDL Browser" at  www.openhealth.org/RDDL/wnbrowse
>> Almost unable to contain myself with excitement, I click on the link
>> to find
>> a text box, and with trembling hands I type "Telemann" and click
>> submit.
>> If I show you what I got, you can come some way to imagining my
>> devastation:
>> "Using org.apache.xerces.parsers.SAXParser
>> Exception net.sf.saxon.trans.DynamicError:
>> org.xml.sax.SAXParseException:
>> White spaces are required between publicId and systemId.
>> org.xml.sax.SAXParseException: White spaces are required between
>> publicId
>> and systemId."
>>
>> Does the emperor have any clothes at all?
>> </rant>
>>
>>
>
>
Received on Monday, 9 February 2009 11:34:31 UTC