Re: any standard wys to detect void dataset descriptions?

Yes, I’m afraid the trying set up automagic crawling for sameAs.org would always have been more challenging than searching by hand (especially if you actually want to be discriminatory about what RDF you accept!).

By the way, http://void.rkbexplorer.com is still running; it has had no maintenance in maybe a decade, and many external links from the page 404, but it may be of use for some people.
(I see the dboedia voiD URI is now different.)

Good luck.

> On 26 Apr 2018, at 12:50, Wouter Beek <w.g.j.beek@vu.nl> wrote:
> 
> Hi Axel, others,
> 
> Three years ago, I did a crawl based on Datahub metadata records and
> VoID files from VoID store.  The results were pretty good at the time:
> I encountered many errors, but also lots of data, resulting in the LOD
> Laundromat dataset of 38B triples (http://lodlaundromat.org).
> 
> Unfortunately, when I tried to do the same scrape again one month ago,
> I encountered _much_ less data in the LOD Cloud collection.  I was
> disappointed, because the LOD Cloud picture has become _bigger_ in the
> last two years.  But then again, the LOD Cloud picture is based on
> human-entered metadata, the data itself is not always there... (or it
> is there, but it cannot be found by automated means).
> 
> I now believe that the best way forward is to manually create a list
> of URLs from which data can be downloaded.  This may seem extreme, but
> it is the last option I see after trying CKAN APIs, VoID, DCAT,
> dereferencing IRIs, etc.  E.g., this is how I am able to find the
> download locations of the BIO2RDF datasets:
> https://github.com/wouterbeek/LOD-Index/blob/master/data/bio2rdf.ttl
> 
> Finally, when I tried to represent these download locations in VoID
> and DCAT, I noticed that there are very common configurations that
> cannot be described by these two vocabularies, e.g., it is not
> possible to describe a distribution that consists of multiple files in
> DCAT, nor is it possible to describe the RDF serialization format of
> individual files in VoID.  These are pretty basic configurations,
> e.g., DBpedia has distributions that consists of very many files, some
> of which are in different serialization formats.  (To be clear: I
> think it is great that people have invested time in creating these
> vocabularies, and having them today is better than having nothing at
> all, but they need several more iterations/revisions before they can
> be used to model real-world data download locations.)
> 
> ---
> Cheers!,
> Wouter.
> 

Received on Thursday, 26 April 2018 20:56:01 UTC