Re: Can we lower the LD entry cost please (part 1)? from Richard Cyganiak on 2009-02-09 (public-lod@w3.org from February 2009)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 9 Feb 2009 01:59:23 +0000
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: "public-lod@w3.org" <public-lod@w3.org>
Message-Id: <0191F4F6-F934-49E5-9519-FF3327A6A887@cyganiak.de>
Hugh,

An important and interesting issue, thanks for raising it, and thanks  
also to everyone else who contributed to this thread.

I tend to agree: A search function that allows looking for resources  
by name greatly increases the usefulness of any dataset, and providing  
such a function is always a good idea.

Let me ask you something, Hugh: Now that you've raised awareness of  
the issue, can you propose some concrete steps that we could take to  
improve the situation? Shall we review the datasets out there and flag  
those without search? Shall we write up a blog post or wiki page?  
Something else?

I want to point out that creating such a site search can be very  
simple for the dataset publisher. For example, at the old Berlin DBLP  
dataset [1], you will find a name search on the homepage. This was a  
last-minute hack, implemented in an hour using a pageful of  
Javascript. It works by asking a SPARQL query to the dataset's SPARQL  
endpoint via AJAX, and redirecting to the best result. Certainly not  
the best search function you've ever seen, but really simple... If  
your dataset wraps a triple store or a relational database or a web  
API, then you almost certainly can use the search functions provided  
by the store/DB/API to implement this, and I would be surprised if it  
takes more than half a day.

Another example to which I've contributed, and which I like quite  
much, is the search of the RDF book mashup [2], which works by  
wrapping the appropriate method of the Amazon web service API. The  
search results are also available as RDF (find it via autodiscovery  
links).

Bradley's mention of RDFa is worth highlighting: In an RDFa-enabled  
website, the local site search, which is probably already available,  
automatically doubles as a search for URIs. This is one of the many  
reasons why I'm becoming an RDFa fanboy -- it makes us create good  
linked data sites simply by following dusty old good practices for  
website design and deployment, such as providing site search!

Finally, allow me to be a bit smirky and quote below from an email I  
sent to this list 14 months ago. In it, I recount similar frustrations  
in finding entry points into a recently announced dataset -- RKB  
Explorer. It's good to see that this site has improved a lot since,  
but it's maybe a bit discouraging that we still face the same general  
problems more than a year later... Anyways, enjoy! ;-)

Next, let's talk about concrete steps that we can take to improve the  
situation.

Best,
Richard

[1] http://www4.wiwiss.fu-berlin.de/dblp/
[2] http://www4.wiwiss.fu-berlin.de/bizer/bookmashup/#search


On 7 Nov 2007, at 21:46, Richard Cyganiak wrote:
> Hugh,
>
> This looks like it could be an awesome resource. Unfortunately I  
> didn't have much luck getting any kind of data back from the services.
>
> The "browse" function doesn't do anything useful for me. I searched  
> for a wide variety of terms, including "the", "a" and "2003" in the  
> first ten or so datasets, including the one called Citeseer and  
> DBLP. No results. What am I supposed to put into the search box?
>
> I also tried to explore the datasets using SPARQL queries. I started  
> with queries such as
>
>   SELECT DISTINCT ?class WHERE { ?x a ?class }
>
> to learn about the vocabulary used in the dataset. These queries  
> return some results on some of the datasets (they time out on  
> others), but clicking any of the results consistently showed a page  
> with zero results. Same for opening in an RDF browser.
>
> So in fact, despite honestly trying, the only way I could get any  
> real data back from the services was by using the four example URIs  
> provided at www.rkbexplorer.com .
>
> Obviously a lot of work went into this. It's a shame that it's so  
> hard to make any use of it because the last 5% are missing.
>
> What are those last 5%?
>
> 1. A brief description of what each dataset actually is, and what  
> sort of data it contains. The currently available information (who  
> provided the data and some triple counts) are not enough.
>
> 2. A bunch of representative example URIs for each dataset.
>
> 3. A bunch of representative and interesting SPARQL queries against  
> each dataset.
>
> 4. If possible, a note on what vocabulary (classes and properties)  
> are used in each dataset. This would greatly simplify SPARQLing the  
> datasets.
>
> 5. You should think really hard about “natural” navigation entry  
> points into the datasets. Is there any natural “root” from which  
> everything can be accessed? Is there a category system or class  
> hierarchy that one can navigate along to find interesting stuff?
>
> 6. You should consider adding a few domain-specific search  
> functions, such as the simple “Find Yourself” function provided at http://dblp.l3s.de/d2r/ 
>  .
>
> I'm a bit frustrated because this looks like an amazingly great  
> resource, but I can't actually get any clear feeling for its scope  
> or quality or contents. This feels like exploring a pitch black room  
> while wearing boxing gloves.
>
> I'm very hopeful that you can greatly improve this experience with  
> little effort.
>
> Thanks a lot,
> Richard






On 7 Feb 2009, at 13:23, Hugh Glaser wrote:

>
> My proposal:
> *We should not permit any site to be a member of the Linked Data  
> cloud if it
> does not provide a simple way of finding URIs from natural language
> identifiers.*
>
> Rationale:
> One aspect of our Linking Data (not to mention our Linking Open  
> Data) world
> is that we want people to link to our data - that is, I have  
> published some
> stuff about something, with a URI, and I want people to be able to  
> use that
> URI.
>
> So my question to you, the publisher, is: "How easy is it for me to  
> find the
> URI your users want?"
>
> My experience suggests it is not always very easy.
> What is required at the minimum, I suggest, is a text search, so  
> that if I
> have a (boring string version of a) name that refers in my mind to
> something, I can hope to find an (exciting Linked Data) URI of that  
> thing.
> I call this a projection from the Web to the Semantic Web.
> rdfs:label or equivalent usually provides the other one.
>
> At the risk of being seen as critical of the amazing efforts of all my
> colleagues (if not also myself), this is rarely an easy thing to do.
>
> Some recent experiences:
> OpenCalais: as in my previous message on this list, I tried hard to  
> find a
> URI for Tim, but failed.
> dbtune: Saw a Twine message about dbtune, trundled over there, and  
> tried to
> find a URI for a Telemann, but failed.
> dbpedia: wanted Tim again. After clicking on a few web pages, none  
> of which
> seemed to provide a search facility, I resorted to my usual method:-  
> look it
> up in wikipedia and then hack the URI and hope it works in dbpedia.
> (Sorry to name specific sites, guys, but I needed a few examples.
> And I am only asking for a little more, so that the fruits of your  
> amazing
> labours can be more widely appreciated!)
> wordnet: [2] below
>
> So I have access to Linked Data sites that I know (or at least  
> strongly
> suspect) have URIs I might want, but I can't find them.
> How on earth do we expect your average punter to join this world?
>
> What have I missed?
> Searching, such as Sindice: Well yes, but should I really have to go  
> off to
> a search engine to find a dbpedia URI? And when I look up "Telemann  
> dbtune"
> I don't get any results. And I wanted the dbtune link, not some  
> other link.
> Did I miss some links on web pages? Quite probably, but the basic  
> problem
> still stands.
> SPARQL: Well, yes. But we cannot seriously expect our users to  
> formulate a
> SPARQL query simply to find out the dbpedia URI for Tim. What is the  
> regexp
> I need to put in? (see below [1])
> A foaf file: Well Tim's dbpedia URI is probably in his foaf file  
> (although
> possibly there are none of Tim's URIs in his foaf file), if I can  
> actually
> find the file; but for some reason I can't seem to find Telemann's  
> foaf
> file.
>
> If you are still doubting me, try finding a URI for Telemann in  
> dbpedia
> without using an external link, just by following stuff from the  
> home page.
> I managed to get a Telemann by using SPARQL without a regexp (it  
> times out
> on any regexp), but unfortunately I get the asteroid.
>
> Again, my proposal:
> *We should not permit any site to be a member of the Linked Data  
> cloud if it
> does not provide a simple way of finding URIs from natural language
> identifiers.*
> Otherwise we end up in a silo, and the world passes us by.
>
> Very best
> Hugh
>
> [And since we have to take our own medicine, I have added a "Just  
> search"
> box right at the top level of all the rkbexplorer.com domains, such as
> http://wordnet.rkbexplorer.com/ ]
>
>
> [1]
> Dbtune finding of Telemann:
> SELECT * WHERE {?s ?p ?name .
> FILTER regex(?name, "Telemann$") }
>
> I tried
> SELECT * WHERE {?s ?p ?name .
> FILTER regex(?name, "telemann$", "i") }
> first, but got no results - not sure why.
>
> [2]
> <rant>
> I cannot believe just how frustrating this stuff can be when you  
> really try
> to use it.
> Because I looked at Sindice for telemann, I know that it is a word in
> wordnet ( http://sindice.com/search?q=Telemann reports loads of
> http://wordnet.rkbexplorer.com/ links).
> Great, he thinks, I can get a wordnet link from a "proper" wordnet  
> publisher
> (ie not me).
> Goes to
> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
> to find wordnet.
> The link there is dead.
> Strips off the last bit, to get to the home princeton wordnet page,  
> and
> clicks on the browser link I find - also dead.
> Go back and look on the
> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSet
> s page, and find the link to http://esw.w3.org/topic/WordNet , but  
> that
> doesn't help.
> So finally, I do the obvious - google "wordnet rdf".
> Of course I get lots of pages saying how available it is, and how  
> exciting
> it is that we have it, and how it was produced; and somewhere in  
> there I
> find a link: "Wordnet-RDF/RDDL Browser" at  www.openhealth.org/RDDL/wnbrowse
> Almost unable to contain myself with excitement, I click on the link  
> to find
> a text box, and with trembling hands I type "Telemann" and click  
> submit.
> If I show you what I got, you can come some way to imagining my  
> devastation:
> "Using org.apache.xerces.parsers.SAXParser
> Exception net.sf.saxon.trans.DynamicError:  
> org.xml.sax.SAXParseException:
> White spaces are required between publicId and systemId.
> org.xml.sax.SAXParseException: White spaces are required between  
> publicId
> and systemId."
>
> Does the emperor have any clothes at all?
> </rant>
>
>
Received on Monday, 9 February 2009 02:00:05 UTC