W3C home > Mailing lists > Public > public-esw-thes@w3.org > September 2010

Re: skos in billion-triple-challenge data

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Sat, 11 Sep 2010 11:30:32 +0100
Message-ID: <4C8B5A48.9080404@few.vu.nl>
To: Ed Summers <ehs@pobox.com>
CC: public-esw-thes@w3.org
On 9/11/10 3:45 AM, Ed Summers wrote:
> On a Friday whim (prompted by Dan Brickley) I downloaded the 2010
> Billion Triple Challenge dataset to look and see how many SKOS
> assertions there are in it, and from what domains. If you are
> interested the results can be found at:
>    http://gist.github.com/574700
> //Ed

Hi Ed,

That's really cool indeed! Yet it's quite puzzling: I don't know what kind of bias there is in this BTC dataset, but there seems to be a strange selection being made. To take a graph we know both quite well, it's just impossible that the full id.loc.gov contained so few as 27,392 SKOS triples. Or have they captured a state in which id.loc.gov did *not* contain LCSH?
Do you have an idea?


Received on Saturday, 11 September 2010 10:31:08 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:13 UTC