Re: LOD Cloud Cache Stats from William Waites on 2011-04-05 (public-lod@w3.org from April 2011)

From: William Waites <ww@styx.org>
Date: Tue, 5 Apr 2011 21:42:08 +0200
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: "public-lod@w3.org" <public-lod@w3.org>, Virtuoso Users <virtuoso-users@lists.sourceforge.net>, "semantic-web@w3.org" <semantic-web@w3.org>, lotico-list@googlegroups.com
Message-ID: <20110405194208.GR21404@styx.org>

So I don't have answers to your questions, but do have some
observations about the results, particularly the counts of
distinct predicates.

The top one is rdf:type which makes sense. Below that we 
have ones used in reification. Who knew there was actually 
that much reified data out there? I wonder where this comes
from and what about the consensus that this is not a good
idea and should be deprecated?

SELECT DISTINCT ?graph, COUNT(?s) AS ?count WHERE {
    GRAPH ?graph { ?s ?p <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> }
} ORDER BY DESC(?count) LIMIT 50

This query times out, but it would be interesting to know
the answer, who is the source of all of these reifications?

Next is rdfs:label, ok, fine. After that, a sizeable chunk
of data has to do with rows and columns in CSV tables that
comes from data.gov. How is a mechanical transliteration
from CSV to RDF without any modelling useful? It just makes
the data a couple of orders of magnitude bigger and a few
more orders of magnitude more cumbersome to deal with. I
mean, being able to refer to a specific spreadsheet cell is
useful but how does actually materialising all of them do
anything but take up disk space and slow down queries?

Cheers,
-w
-- 
William Waites                <mailto:ww@styx.org>
http://river.styx.org/ww/        <sip:ww@styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Received on Tuesday, 5 April 2011 19:42:33 UTC