W3C home > Mailing lists > Public > semantic-web@w3.org > March 2015

Re: Looking for pedagogically useful data sets

From: Sarven Capadisli <info@csarven.ca>
Date: Thu, 12 Mar 2015 08:54:18 +0100
Message-ID: <5501462A.3080903@csarven.ca>
To: Paul Houle <ontology2@gmail.com>, "semantic-web@w3.org" <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>
On 2015-03-12 00:13, Paul Houle wrote:
> Hello all,
>
>        I am looking for some RDF data sets to use in a short presentation on
> RDF and SPARQL.  I want to do a short demo,  and since RDF and SPARQL will
> be new to this audience,  I was hoping for something where the predicates
> would be easy to understand.
>
>       I was hoping that the LOGD data from RPI/TWC would be suitable,  but
> once I found the old web site (the new one is down) and manually fixed the
> broken download link I found the predicates were like
>
> <http://data-gov.tw.rpi.edu/vocab/p/1525/v96>
>
> and the only documentation I could find for them (maybe I wasn't looking in
> the right place) was that this predicate has an rdf:label of "V96".)
>
> Note that an alpha+numeric code is good enough for Wikidata and it is
> certainly concise,  but I don't want :v96 to be the first things that these
> people see.
>
> Something I like about this particular data set is that it is about 1
> million triples which is big enough to be interesting but also small enough
> that I can load it in a few seconds,  so that performance issues are not a
> distraction.
>
> The vocabulary in DBpedia is closer to what I want (and if I write the
> queries most of the distracting things about vocab are a non-issue) but
> then data quality issues are the distraction.
>
> So what I am looking for is something around 1 m triples in size (in terms
> of order-of-magnitude) and where there are no distractions due to obtuse
> vocabulary or data quality issues.  It would be exceptionally cool if there
> were two data sets that fit the bill and I could load them into the triple
> store together to demonstrate "mashability"
>
> Any suggestions?
>


re: "predicates would be easy to understand", whether the label is V96 
or some molecule, needless to say, it takes some level of familiarity 
with the data.

Perhaps something that's familiar to most people is Social Web data. I 
suggest looking at whatever is around VCard, FOAF, SIOC for instance. 
The giant portion in the LOD Cloud with the StatusNet nodes (in cyan) 
use FOAF and SIOC. (IIRC, unless GnuSocial is up to something else these 
days.)


If statistical LD is of interest, check out whatever is under 
http://270a.info/ (follow the VoIDs to respective dataspaces). You can 
reach close to 10k datasets there, with varying sizes. I think the best 
bet for something small enough is to pick one from the 
http://worldbank.270a.info/ dataspace e.g., GDP, mortality, education..

Or take an observation from somewhere, e.g:

http://ecb.270a.info/dataset/EXR/Q/ARS/EUR/SP00/A/2000-Q2

and follow-your-nose.

You can also approach from a graph exploration POV, e.g:

http://en.lodlive.it/?http://worldbank.270a.info/classification/country/CA

or a visualization, e.g., Sparkline (along the lines of how it was 
suggested by Edward Tufte):

http://stats.270a.info/sparkline

(JavaScript inside SVG building itself by poking at the SPARQL endpoint)

If you want to demonstrate what other type of things you can do with 
this data, consider something like:

http://stats.270a.info/analysis/worldbank:SP.DYN.IMRT.IN/transparency:CPI2011/year:2011

See also "Oh Yeah?" and so on..


Any way... as a starting point, social data/vocabs may be easier to get 
across, but then you always have to (IMHO) show some applications or 
visualizations for the data to bring the ideas back home.

-Sarven
http://csarven.ca/#i
Received on Thursday, 12 March 2015 07:54:47 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:49:36 UTC