Re: How to find the data I need in LD? algorithm and questions. from Bernadette Hyland on 2012-03-16 (public-lod@w3.org from March 2012)

From: Bernadette Hyland <bhyland@3roundstones.com>
Date: Fri, 16 Mar 2012 17:11:27 -0400
To: Hugh Glaser <hg@ecs.soton.ac.uk>, Yury Katkov <katkov.juriy@gmail.com>
Cc: Semantic Web <semantic-web@w3.org>, public-lod <public-lod@w3.org>
Message-Id: <ECA29576-0D3A-4F39-9F72-54BE8C1FBFDF@3roundstones.com>
Hi,
Hugh - I responded earlier today to Yury, off-list.  So I would offer a different perspective, perhaps because the sun is out here today and it is Friday afternoon and the plum blossoms are blooming...

We've moved from:
* shouting (circa 2003-2006) to
* the meme of Linked Data by TimBL (2007) [1] 
* proof-of-concepts (2008-2010) to
* a couple academic books, conference talks & keynotes on real world deployments involving LD/LOD (2010, 2011) to
* developers books, W3C Recommendations, published use cases/CXO guides (2012)

FWIW, I offered to fold in some of Yury's guidance to the draft Linked Data Cookbook[2] and suggested the cookbook as a possible resource for his students.

If you are open to a different viewpoint, here is what I see on the ground in 2012.  There are publishers, both in the private & public sector, who are beginning to publish data as Linked Data.  It is of course a new approach to data publishing and consumption and there are some really entrenched players, so it isn't going to happen within one or two years.  Furthermore, everyone has a "day job" and learning yet another way to publish your data doesn't sound like a career-building activity on face value ...

I contend, it will take some public successes, plus a couple of pragmatic Linked Data books for developers, some cookbooks or how-to's, and some well-formed W3C Recommendations for Linked Open Data to be pervasive ... all of which is in progress.

It will take probably 10 years before LD/LOD publishing is 'mainstream' but make no mistake, it will happen.  A Linked Data approach to publishing data (on the Web of data) is as disruptive as the Web of documents was circa 1995.   

It will save organizations millions and governments billions of dollars (or their currency equivalents) in enterprise information integration.  Do I have documented ROIs in a glossy printed consulting report to back that up - no, not yet.  I believe we (as in the Linked Data ecosystem) will have this soon.   The numbers & case studies will come from big international organizations involved in issue tracking & customer care, business publishing, healthcare, logistics and defense (the non-secret-squirrel-part of defense).

Regardless whether orgs are doing LD behind the firewall or in front of it, publishing Linked Data makes good economic sense but we're in the early days.  Don't loose heart.

I see university students are learning about LD now in undergrad CS classes.  About 20 of us from the UK, Netherlands, Spain, US, India, Australia in government / academe / private sector meet weekly on the W3 Gov't Linked Data Working Group  to nut out vocabs, best practices & a cookbook for gov't publication & consumption.  

FYR, data.gov recently featured a blogpost [4] by a uni student who did a mashup where he didn't know the publisher of US Gov't content, although he did work under the supervision of someone who knows a bit about RDF.

Kind regards,

Bernadette Hyland

[1] http://www.w3.org/DesignIssues/LinkedData.html
[2] http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook
[3]  http://www.data.gov/communities/node/116/blogs/6170


On Mar 16, 2012, at 4:15 PM, Hugh Glaser wrote:

> Hi Yury
> Well I am sorry to see you have had no response, but it is not so surprising, really.
> You will find that essentially there are very few people doing what you are trying to do.
> The Semantic Web and Linked Data world is made up of people who publish, and rarely consume.
> It is almost unheard of for someone to consume someone else's data, unless they know the publisher.
> Everyone is shouting, but not many listening.
> OK, I might not be in a great mood today, but I'm not far wrong.
> 
> To your problem.
> Your steps seem reasonable.
> I would, however, add the use of VoiD (http://www.w3.org/TR/void/, http://semanticweb.org/wiki/VoiD).
> VoiD is designed to deliver what you want, I think (if it doesn't, then it should be made to).
> Some sites do publish VoiD descriptions, and these can often be located automatically by looking in the sitemap, which can in turn be discovered by looking in robots.txt.
> Keith Alexander has a store of collected VoiD descriptions (http://kwijibo.talis.com/voiD/), as do we (http://void.rkbexplorer.com).
> I would also suggest that my own site, http://sameas.org might lead from interesting URIs to other related URIs, and hence interesting stores.
> 
> Hope that helps.
> Best
> Hugh
> 
> On 16 Mar 2012, at 04:58, Yury Katkov wrote:
> 
>> Hi!
>> 
>> What do you usually do when you want to find a dataset for your needs?
>> I'm preparing a tiny tutorial on this topic for the students and ask
>> you to share your experience.
>> My typical algorithm is the following:
>> 0) Define the topic. I have to know precisely what kind of data I need.
>> 1) Look at Linked Data cloud and other visualizations to ensure that
>> the needed data is presented somewhere. If for example I want to
>> improve Mendeley or Zotero I look at these visualizations and search
>> for publication data.
>> 2) Search the needed properties and classes with Sindice, Sig.ma and Swoogle.
>> 3) Look at CKAN description of the dataset, its XML citemap and VoiD metadata.
>> 4) explore the dataset that were found on the previous step with some
>> simple SPARQL queries like these:
>> 
>> SELECT DISTINCT ?p WHERE {
>> ?s ?p ?o
>> }
>> 
>> SELECT DISTINCT ?class WHERE {
>> { ?class a rdfs:Class . }
>> UNION
>> {?class a owl:Class . }
>> }
>> 
>> SELECT DISCTINCT ?label WHERE {
>> {?a rdfs:label ?label}
>> UNION
>> {?a dc:title ?label}
>> /* and possibly some more things to search foaf:name's and so on */
>> }
>> 
>> I can also use COUNTing and GROUPing BY to get some quick statistics
>> about the datasets.
>> 5) When I find some interesting URIs I use semantic web browsers
>> Marbles and Sig.ma to navigate through the dataset.
>> 5) Ask these smart guys in Semantic Web mailing list and Public LOD
>> mailing list. Probably go to semanticoverflow and ask for help there
>> as well
>> ======================
>> Here are my questions:
>> 
>> 1) What else do you typically doing to find the dataset?
>> 2) Is there a resource where I can find the brief description of the
>> dataset in terms of properties and classes that mentioned there? And
>> these cool arrows in Richard Cyganiak's diagram: is there a resource
>> where I can find the information about relationship between the given
>> dataset and the rest of the world?
>> 3) I have similar algorithm for searching vocabularies. Can resources
>> like Schemapedia help me in searching the dataset?
>> 4) Do you know any other meeting SPARQL queries that can be handy when
>> I search something in the dataset.
>> 
>> Sincerely yours,
>> -----
>> Yury Katkov
>> 
> 
> -- 
> Hugh Glaser,  
>             Web and Internet Science
>             Electronics and Computer Science,
>             University of Southampton,
>             Southampton SO17 1BJ
> Work: +44 23 8059 3670, Fax: +44 23 8059 3045
> Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
> http://www.ecs.soton.ac.uk/~hg/
> 
>
Received on Friday, 16 March 2012 21:11:57 UTC