Re: tools for inspecting rdf from Richard Cyganiak on 2011-03-08 (semantic-web@w3.org from March 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 8 Mar 2011 15:38:07 +0000
To: William Waites <ww@styx.org>
Cc: Semantic Web <semantic-web@w3.org>, Jan Demter <jan@demter.de>, Aidan Hogan <aidan.hogan@deri.org>
Message-Id: <5F9EA06C-CC77-4937-98F1-A248DDD1AC26@cyganiak.de>

On 8 Mar 2011, at 12:45, William Waites wrote:
> So you have some RDF data. What is it like? What
> vocabularies are used? Which classes or predicates
> are used and in what proportion? Do literals tend
> to have languages or datatypes? If so which ones?
> 
> These types of questions are explored to an extent
> by the analysis done for the LOD cloud, but what 
> tools exist for answering these questions? Is it
> invariably roll your own or is there a program 
> that you can point at a SPARQL endpoint or RDF
> file and get a nice summary describing the nature
> of the data?

I have a quick and dirty tool here:
https://github.com/cygri/make-void

It computes VoID statistics from a local RDF file. Doesn't really scale beyond a million triples or so, as everything is done in memory.

A version that computes the same statistics with MapReduce over N-Triples and N-Quads dumps would be useful beyond belief to me.

Another idea would be running this kind of stats directly against public SPARQL endpoints, but this generally doesn't work, as most SPARQL endpoints refuse to answer COUNT queries that touch the entire dataset.

Best,
Richard

> 
> Cheers,
> -w
> -- 
> William Waites                <mailto:ww@styx.org>
> http://river.styx.org/ww/        <sip:ww@styx.org>
> F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45
>

Received on Tuesday, 8 March 2011 15:38:42 UTC