Re: LinkedMDB dump? from Martynas Jusevičius on 2016-03-11 (semantic-web@w3.org from March 2016)

From: Martynas Jusevičius <martynas@graphity.org>
Date: Fri, 11 Mar 2016 07:51:01 +0000
To: Wouter Beek <w.g.j.beek@vu.nl>, Jean-Claude Moissinac <jean-claude.moissinac@telecom-paristech.fr>
Cc: Luca Matteis <lmatteis@gmail.com>, Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>, Semantic Web <semantic-web@w3.org>, Paul Groth <p.groth@elsevier.com>
Message-ID: <CAE35VmxHtE-a+UVwwXKm6Fb_2opS-LiN2oK75zuO8Cd1tRtW+g@mail.gmail.com>

Hey Wouter,

these are data access methods that complement each other. They serve
different use cases.

SPARQL gives you selectivity and expressiveness. Data dump gives you full
data. LDF is something in between.

To provide only one of those might not be enough, but it does not mean a
query technology breaks the open data agenda?


Martynas
graphityhq.com
On Fri, 11 Mar 2016 at 08:29, Wouter Beek <w.g.j.beek@vu.nl> wrote:

> Hi,
>
> Let me summarize this conversation:
>
>   - Person asks how to obtain Linked Open Dataset.
>   - Solution 1: SPARQL "appears to have some kind of 'throttling'
> preventing to get exhaustive results".
>   - Solution 2: LDF "I can't seem to extract the dump, but you can crawl
> it all you want".
>   - Solution 3: Datadump "Just click this link".
>
> My take on this:
>
>   - SPARQL is ATM not a viable dissemination strategy for Open Data since
> it introduces arbitrary barriers to result set size.  The first requirement
> on any Open Data dissemination strategy should be that it is at least
> possible to obtain the full data.
>   - LDF is a viable dissemination strategy for Open Data since it allows
> low-level queries to be asked without sacrificing openness the way SPARQL
> does.  However, downloading all the data potentially requires many HTTP
> requests since data is segmented in relatively small pages (a very common
> approach in Web APIs).
>   - Datadumps are inferior to LDF (no triple pattern queries) but superior
> to SPARQL endpoints (all data can be retrieved).  They are also superior to
> LDF for the singular use case of obtaining all the data.
>
> My questions for the community:
>
>   - Should we still promote SPARQL when we know that it so fundamentally
> breaks the Open Data agenda?
>   - Could LDF be improved to handle the "just give me all the data" use
> case better?  I'm thinking of being able to open LDF results in a
> continuous gzipped stream i.o. separate pages.
>
> ---
> Best,
> Wouter Beek.
>
> Email: w.g.j.beek@vu.nl
> WWW: wouterbeek.com
> Tel: +31647674624
>
> On Thu, Mar 10, 2016 at 10:39 PM, Jean-Claude Moissinac <
> jean-claude.moissinac@telecom-paristech.fr> wrote:
>
>> There is also an archive here for the triples from 2009  on
>> http://www.cs.toronto.edu/~oktie/linkedmdb/
>>
>>
>> Cet e-mail a été envoyé depuis un ordinateur protégé par Avast.
>> www.avast.com
>> <https://www.avast.com/fr-fr/lp-safe-emailing?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=OA-2109-A>
>> <#msg-f:1528489695260709386_1574609069_DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>
>> --
>> Jean-Claude Moissinac
>>
>>
>> 2016-03-10 22:22 GMT+01:00 Luca Matteis <lmatteis@gmail.com>:
>>
> I have a Linked Data Fragments version running here
>>> http://hdt-gae.appspot.com/
>>> I can't seem to extract the dump, but you can crawl it all you want
>>> since LDFs have no throttling ;)
>>>
>>> On Thu, Mar 10, 2016 at 8:48 PM, Pierre-Antoine Champin
>>> <pierre-antoine.champin@liris.cnrs.fr> wrote:
>>> > Hi SemWeb people,
>>> >
>>> > I'd like to run a local version of LinkedMDB [1], but it seems that
>>> only the
>>> > SPARQL endpoint is still running, the wiki and the dump [2] are down.
>>> >
>>> > Would anyone have a backup of the dump?
>>> >
>>> > NB: before anyone suggests that: it can't be easily extracted from the
>>> > SPARQL endpoint, which appears to have some kind of "throttling"
>>> preventing
>>> > to get exhaustive results on very "open" queries...
>>> >
>>> > [1] http://data.linkedmdb.org/
>>> > [2]
>>> >
>>> https://datahub.io/dataset/linkedmdb/resource/dd7619f9-cc39-47eb-a72b-5f34cffe1d16
>>>
>>>

Received on Friday, 11 March 2016 07:51:41 UTC