RE: Looking for pedagogically useful data sets from Margaret Warren on 2015-03-13 (semantic-web@w3.org from March 2015)

From: Margaret Warren <mm@zeroexp.com>
Date: Fri, 13 Mar 2015 09:19:03 -0500
To: "'Paul Houle'" <ontology2@gmail.com>, <semantic-web@w3.org>, "'Linked Data community'" <public-lod@w3.org>
Message-ID: <029e01d05d98$a8ec8770$fac59650$@zeroexp.com>
Hi Paul, 

 

I am not sure if you would have an interest at all in this – and it may not even be appropriate to suggest, but I will throw it out there. 

 

The ImageSnippets dataset (http://www.imagesnippets.com)  is now published on datahub.io, we have about 16k images and with those images are triples with these kind of statistics (as of today):  4985 dbPedia entities, 2072 yago entities and 760 Art & Architecture Thesaurus entities. The number is always growing – even if slowly :) We just started publishing it a few weeks ago.  

So while this dataset may not be big enough by itself for your purposes, perhaps you could mash it up with something else interesting?  Depict is the main term for the main item in an image – but there are terms like: hasInBackground, and you can get interesting combinations with SPARQL. 

 

something like: thisImage depicts Coffee Cup and hasInBackground a fireplace

 

The images (or a visual part/region) of the images are the subject of the triples and most use a 

relation from LIO – a lightweight image ontology, this ontology is published on LOV and uses 11 terms which are purposefully ambiguous – but definitely easy for a complete RDF newbie to understand. These properties are explained here: http://www.imagesnippets.com/ArtSpeak/help/properties.html,

and also in the help files in the application itself. 

 

Some of the images are protected by copyrights – but this usage is in the data, most are CC licensed and almost all would be fine for academic or research use. 

 

I’d be interested in seeing whether someone could do something interesting with the data. We’d be happy to answer more questions if anyone is interested. 

 

Thanks,

Margaret

 

 

From: Paul Houle [mailto:ontology2@gmail.com] 
Sent: Wednesday, March 11, 2015 6:14 PM
To: semantic-web@w3.org; Linked Data community
Subject: Looking for pedagogically useful data sets

 

Hello all,

 

      I am looking for some RDF data sets to use in a short presentation on RDF and SPARQL.  I want to do a short demo,  and since RDF and SPARQL will be new to this audience,  I was hoping for something where the predicates would be easy to understand.

 

     I was hoping that the LOGD data from RPI/TWC would be suitable,  but once I found the old web site (the new one is down) and manually fixed the broken download link I found the predicates were like

 

<http://data-gov.tw.rpi.edu/vocab/p/1525/v96>

 

and the only documentation I could find for them (maybe I wasn't looking in the right place) was that this predicate has an rdf:label of "V96".)

 

Note that an alpha+numeric code is good enough for Wikidata and it is certainly concise,  but I don't want :v96 to be the first things that these people see.

 

Something I like about this particular data set is that it is about 1 million triples which is big enough to be interesting but also small enough that I can load it in a few seconds,  so that performance issues are not a distraction.

 

The vocabulary in DBpedia is closer to what I want (and if I write the queries most of the distracting things about vocab are a non-issue) but then data quality issues are the distraction.

 

So what I am looking for is something around 1 m triples in size (in terms of order-of-magnitude) and where there are no distractions due to obtuse vocabulary or data quality issues.  It would be exceptionally cool if there were two data sets that fit the bill and I could load them into the triple store together to demonstrate "mashability"

 

Any suggestions?

 

-- 

Paul Houle
(607) 539 6254    paul.houle on Skype   ontology2@gmail.com <mailto:ontology2@gmail.com> 

http://legalentityidentifier.info/lei/lookup
Received on Friday, 13 March 2015 14:19:37 UTC