W3C home > Mailing lists > Public > semantic-web@w3.org > March 2012

Re: How to find the data I need in LD? algorithm and questions.

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Fri, 16 Mar 2012 20:15:16 +0000
To: Yury Katkov <katkov.juriy@gmail.com>
CC: Semantic Web <semantic-web@w3.org>, public-lod <public-lod@w3.org>
Message-ID: <EMEW3|078f94ddc8e3bcfa5add99842d7e7b6co2FKFe02hg|ecs.soton.ac.uk|33FB0942-AE10-4CF6-B3B2-4EFDF5D1E5EC@ecs.soton.ac.uk>
Hi Yury
Well I am sorry to see you have had no response, but it is not so surprising, really.
You will find that essentially there are very few people doing what you are trying to do.
The Semantic Web and Linked Data world is made up of people who publish, and rarely consume.
It is almost unheard of for someone to consume someone else's data, unless they know the publisher.
Everyone is shouting, but not many listening.
OK, I might not be in a great mood today, but I'm not far wrong.

To your problem.
Your steps seem reasonable.
I would, however, add the use of VoiD (http://www.w3.org/TR/void/, http://semanticweb.org/wiki/VoiD).
VoiD is designed to deliver what you want, I think (if it doesn't, then it should be made to).
Some sites do publish VoiD descriptions, and these can often be located automatically by looking in the sitemap, which can in turn be discovered by looking in robots.txt.
Keith Alexander has a store of collected VoiD descriptions (http://kwijibo.talis.com/voiD/), as do we (http://void.rkbexplorer.com).
I would also suggest that my own site, http://sameas.org might lead from interesting URIs to other related URIs, and hence interesting stores.

Hope that helps.

On 16 Mar 2012, at 04:58, Yury Katkov wrote:

> Hi!
> What do you usually do when you want to find a dataset for your needs?
> I'm preparing a tiny tutorial on this topic for the students and ask
> you to share your experience.
> My typical algorithm is the following:
> 0) Define the topic. I have to know precisely what kind of data I need.
> 1) Look at Linked Data cloud and other visualizations to ensure that
> the needed data is presented somewhere. If for example I want to
> improve Mendeley or Zotero I look at these visualizations and search
> for publication data.
> 2) Search the needed properties and classes with Sindice, Sig.ma and Swoogle.
> 3) Look at CKAN description of the dataset, its XML citemap and VoiD metadata.
> 4) explore the dataset that were found on the previous step with some
> simple SPARQL queries like these:
> ?s ?p ?o
> }
> { ?class a rdfs:Class . }
> {?class a owl:Class . }
> }
> {?a rdfs:label ?label}
> {?a dc:title ?label}
> /* and possibly some more things to search foaf:name's and so on */
> }
> I can also use COUNTing and GROUPing BY to get some quick statistics
> about the datasets.
> 5) When I find some interesting URIs I use semantic web browsers
> Marbles and Sig.ma to navigate through the dataset.
> 5) Ask these smart guys in Semantic Web mailing list and Public LOD
> mailing list. Probably go to semanticoverflow and ask for help there
> as well
> ======================
> Here are my questions:
> 1) What else do you typically doing to find the dataset?
> 2) Is there a resource where I can find the brief description of the
> dataset in terms of properties and classes that mentioned there? And
> these cool arrows in Richard Cyganiak's diagram: is there a resource
> where I can find the information about relationship between the given
> dataset and the rest of the world?
> 3) I have similar algorithm for searching vocabularies. Can resources
> like Schemapedia help me in searching the dataset?
> 4) Do you know any other meeting SPARQL queries that can be handy when
> I search something in the dataset.
> Sincerely yours,
> -----
> Yury Katkov

Hugh Glaser,  
             Web and Internet Science
             Electronics and Computer Science,
             University of Southampton,
             Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
Received on Friday, 16 March 2012 20:16:14 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:41:31 UTC