Re: Semantic Web Search engines/Billion Triple Challenge or other data-sets? from Melvin Carvalho on 2016-06-22 (semantic-web@w3.org from June 2016)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Wed, 22 Jun 2016 17:28:11 +0200
To: Harry Halpin <hhalpin@ibiblio.org>
Cc: Semantic Web <semantic-web@w3.org>
Message-ID: <CAKaEYhK_7o=_5TKQQyqQ50+iqYqduGJdrG39=zpoyx1v8ZBJSA@mail.gmail.com>

On 22 June 2016 at 16:55, Harry Halpin <hhalpin@ibiblio.org> wrote:

> Are there any data-sets available that are realistic snapshots of the
> state of linked data in 2016?
>
> I used to search using Sindice, but it's been down a while.
>
> I see Swoogle is still up (http://swoogle.umbc.edu) but not sure if it's
> updated its index.
>
> Also, a bunch of triples in the raw would also be fine, ala the BTC (but a
> recent data-set, at least 2014 or later).
>

Im not sure it's going to be possible to see everything that's out there,
because linked data from what I can see starting the shift from 100% open
and public to a combination of public, shared and private.

Personally, Im adding 100k triples to my knowledge base every day.  I
expect that will be 1 million next year, and increasing after that.

I was at coffee at my co working space lately and someone mentioned to me
they are working with a 1 trillion+ triple private data store.

I see this trend continuing.

IMHO the next frontier of semantic search is a little more P2P, shared and
access controlled as more people start to build up caches of the web
(perhaps for speed of access and offline) and index that content.  It ought
to be possible to share that kind of data socially, using technologies like
Solid.

>
>   yours,
>    harry
>
>

Received on Wednesday, 22 June 2016 15:28:45 UTC