Re: A proposal for handling bulk data requests from Giovanni Tummarello on 2011-07-11 (public-lod@w3.org from July 2011)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Mon, 11 Jul 2011 13:54:59 +0200
To: Michael Hausenblas <michael.hausenblas@deri.org>
Cc: Kingsley Idehen <kidehen@openlinksw.com>, Linked Data community <public-lod@w3.org>, Jürgen Umbrich <juergen.umbrich@deri.org>, Sindice developers list <sindice-dev@lists.deri.org>, "giulio.cesare@gmail.com" <giulio.solaroli@deri.org>
Message-ID: <CAHHRs7hEBqcUW3J7Zu+SzSd80yKKRhOZETinGki41hDgCcSVFA@mail.gmail.com>

> An idea that arose out of a recent discussion with Juergen (in CC): how
> about providing a sort of 'bulk data request' facility for your SPARQL
> endpoints [1] [2] (as they are, I gather, the more popular ones on the WoD
> ;)?
>


Hi wrt to Sindice, it is important that data we index publicly
reflects a public web site and, in general is the same that a user
would normally find on a hosting website.  People trust and understand
the content of websites much better than datasets.

the goal however is indeed that to expose a dataset and to have people
query that (e.g. on our sparql endpoint).

For smaller websites, content managemeny systems with rdfa/schema
markup etc, honestly sitemap.xml does its job. (also the overhead -
conceptual and otherwise-   of creating a dump for these sites is too
much)

for big websites, lod datasets etc, we are indeed working specifically
on supporting dumps.

We used to support semantic sitemaps which provide pointers to dumps
but reality of thing is dumps come in many different forms, some split
the same dataset in multiple files, some provide different version
(e.g. datest) etc. some provide different formats.

then, how to index these in a way that mimick the way they are exposed
on the web (with descriptions handed out when an authorititative URI
is resolved)? we used to have some constructs in the Semantic Sitemap
proposed protocol , but very few got it right, incentives for getting
it right or fixing issues are seriously lacking.

bottom line: we'll provide a form (very soon) to provide a link and a
description of your dataset, as well as an email. We'll then process
manually until we're reasonably solid to propose a viable proposed
more mechanized solution.

Gio

Received on Monday, 11 July 2011 11:55:54 UTC