W3C home > Mailing lists > Public > public-lod@w3.org > April 2010

Re: DBpedia hosting burden

From: Daniel Koller <dakoller@googlemail.com>
Date: Wed, 14 Apr 2010 23:50:31 +0200
Message-ID: <k2p2bca8c351004141450k3ffaa17cxd4fe51d3da9be651@mail.gmail.com>
To: Dan Brickley <danbri@danbri.org>
Cc: public-lod <public-lod@w3.org>, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
Dan,

....I just setup some torrent files containing the current english and german
dbpedia content: (.. as a test/proof of concept, was just curious to see how
fast a network effect via p2p networks).

To try, go to http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html.

I presume to get it working you need just the first people downloading (and
keep spreading it around w/ their Torrent-Clients)... as long as the
*.torrent-files are consistent. (layout of the link page courtesy of the
dbpedia-people)

Kind regards,

Daniel

On Wed, Apr 14, 2010 at 9:04 PM, Dan Brickley <danbri@danbri.org> wrote:

> On Wed, Apr 14, 2010 at 8:11 PM, Kingsley Idehen <kidehen@openlinksw.com>
> wrote:
>
>
> > Some have cleaned up their act for sure.
> >
> > Problem is, there are others doing the same thing, who then complain
> about
> > the instance in very generic fashion.
>
> They're lucky it exists at all. I'd refer them to this Louis CK sketch
> -
> http://videosift.com/video/Louie-CK-on-Conan-Oct-1st-2008?fromdupe=We-live-in-an-amazing-amazing-world-and-we-complain
> (if it stays online...).
>
> >> While it is a
> >> shame to say 'no' to people trying to use linked data, this would be
> >> more saying 'yes, but not like that...'.
> >>
> >
> > I think we have an outstanding blog post / technical note about the
> DBpedia
> > instance that hasn't been published (possibly due to the 3.5 and
> > DBpedia-Live work we are doing), said note will cover how to work with
> the
> > instance etc..
> [..]
> > We do have a solution in mind, basically, we are going to have a
> different
> > place for the descriptor resources and redirect crawlers there  via 303's
> > etc..
> [...]
> > We'll get the guide out.
>
>
> That sounds useful
>
> >> As you mention, DBpedia is an important and central resource, thanks
> >> both to the work of the Wikipedia community, and those in the DBpedia
> >> project who enrich and make available all that information. It's
> >> therefore important that the SemWeb / Linked Data community takes care
> >> to remember that these things don't come for free, that bills need
> >> paying and that de-referencing is a privilege not a right.
> >
> > "Bills" the major operative word in a world where the "Bill Payer" and
> > "Database Maintainer" is a footnote (at best) re. perception of what
> > constitutes the DBpedia Project.
>
> Yes, I'm sure some are thoughtless and take it for granted; but also
> that others are well aware of the burdens.
>
> (For that matter, I'm not myself so sure how Wikipedia cover their
> costs or what their longer-term plan is...).
>
>
> > For us, the most important thing is perspective. DBpedia is another space
> on
> > a public network, thus it can't magically rewrite the underlying physics
> of
> > wide area networking where access is open to the world.  Thus, we can
> make a
> > note about proper behavior and explain how we protect the instance such
> that
> > everyone has a chance of using it (rather than a select few resource
> > guzzlers).
>
> This I think is something others can help with, when presenting LOD
> and related concepts: to encourage good habits that spread the cost of
> keeping this great dataset globally available. So all those making
> slides, tutorials, blog posts or software tools have a role to play
> here.
>
> >> Are there any scenarios around eg. BitTorrent that could be explored?
> >> What if each of the static files in http://dbpedia.org/sitemap.xml
> >> were available as torrents (or magnet: URIs)?
> >
> > When we set up the Descriptor Resource host, these would certainly be
> > considered.
>
> Ok, let's take care to explore that then; it would probably help
> others too. There must be dozens of companies and research
> organizations who could put some bandwidth resources into this, if
> only there was a short guide to setting up a GUI-less bittorrent tool
> and configuring it appropriately. Are there any bittorrent experts on
> these mailing lists who could suggest next practical steps here (not
> necessarily dbpedia-specific)?
>
> (ah I see a reply from Ivan; copying it in here...)
>
> > If I were The Emperor of LOD I'd ask all grand dukes of datasources to
> > put fresh dumps at some torrent with control of UL/DL ratio :) For
> > reason I can't understand this idea is proposed few times per year but
> > never tried.
>
> I suspect BitTorrent is in some ways somehow 'taboo' technology, since
> it is most famous for being used to distributed materials that
> copyright-owners often don't want distributed. I have no detailed idea
> how torrent files are made, how trackers work, etc. I started poking
> around magnet: a bit recently but haven't got a sense for how solid
> that work is yet. Could a simple Wiki page be used for sharing
> torrents? (plus published hash of files elsewhere for integrity
> checks). What would it take to get started?
>
> Perhaps if http://wiki.dbpedia.org/Downloads35 had the sha1 for each
> download published (rdfa?), then others could experiment with torrents
> and downloaders could cross-check against an authoritative description
> of the file from dbpedia?
>
> >>  I realise that would
> >> only address part of the problem/cost, but it's a widely used
> >> technology for distributing large files; can we bend it to our needs?
> >>
> >
> > Also, we encourage use of gzip over HTTP  :-)
>
> Are there any RDF toolkits in need of a patch to their default setup
> in this regard? Tutorials that need fixing, etc?
>
> cheers,
>
> Dan
>
>
> ps. re big datasets, Library of Congress apparently are going to have
> complete twitter archive - see
> http://twitter.com/librarycongress/status/12172217971  ->
>
> http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/
>
>


-- 
---
Daniel Koller
Jahnstrasse 20
80469 München * dakoller@googlemail.com
Received on Monday, 19 April 2010 05:29:10 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:26 UTC