- From: Daniel Koller <dakoller@googlemail.com>
- Date: Wed, 14 Apr 2010 23:50:31 +0200
- To: Dan Brickley <danbri@danbri.org>
- Cc: public-lod <public-lod@w3.org>, dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>
- Message-ID: <k2p2bca8c351004141450k3ffaa17cxd4fe51d3da9be651@mail.gmail.com>
Dan, ....I just setup some torrent files containing the current english and german dbpedia content: (.. as a test/proof of concept, was just curious to see how fast a network effect via p2p networks). To try, go to http://dakoller.net/dbpedia_torrents/dbpedia_torrents.html. I presume to get it working you need just the first people downloading (and keep spreading it around w/ their Torrent-Clients)... as long as the *.torrent-files are consistent. (layout of the link page courtesy of the dbpedia-people) Kind regards, Daniel On Wed, Apr 14, 2010 at 9:04 PM, Dan Brickley <danbri@danbri.org> wrote: > On Wed, Apr 14, 2010 at 8:11 PM, Kingsley Idehen <kidehen@openlinksw.com> > wrote: > > > > Some have cleaned up their act for sure. > > > > Problem is, there are others doing the same thing, who then complain > about > > the instance in very generic fashion. > > They're lucky it exists at all. I'd refer them to this Louis CK sketch > - > http://videosift.com/video/Louie-CK-on-Conan-Oct-1st-2008?fromdupe=We-live-in-an-amazing-amazing-world-and-we-complain > (if it stays online...). > > >> While it is a > >> shame to say 'no' to people trying to use linked data, this would be > >> more saying 'yes, but not like that...'. > >> > > > > I think we have an outstanding blog post / technical note about the > DBpedia > > instance that hasn't been published (possibly due to the 3.5 and > > DBpedia-Live work we are doing), said note will cover how to work with > the > > instance etc.. > [..] > > We do have a solution in mind, basically, we are going to have a > different > > place for the descriptor resources and redirect crawlers there via 303's > > etc.. > [...] > > We'll get the guide out. > > > That sounds useful > > >> As you mention, DBpedia is an important and central resource, thanks > >> both to the work of the Wikipedia community, and those in the DBpedia > >> project who enrich and make available all that information. It's > >> therefore important that the SemWeb / Linked Data community takes care > >> to remember that these things don't come for free, that bills need > >> paying and that de-referencing is a privilege not a right. > > > > "Bills" the major operative word in a world where the "Bill Payer" and > > "Database Maintainer" is a footnote (at best) re. perception of what > > constitutes the DBpedia Project. > > Yes, I'm sure some are thoughtless and take it for granted; but also > that others are well aware of the burdens. > > (For that matter, I'm not myself so sure how Wikipedia cover their > costs or what their longer-term plan is...). > > > > For us, the most important thing is perspective. DBpedia is another space > on > > a public network, thus it can't magically rewrite the underlying physics > of > > wide area networking where access is open to the world. Thus, we can > make a > > note about proper behavior and explain how we protect the instance such > that > > everyone has a chance of using it (rather than a select few resource > > guzzlers). > > This I think is something others can help with, when presenting LOD > and related concepts: to encourage good habits that spread the cost of > keeping this great dataset globally available. So all those making > slides, tutorials, blog posts or software tools have a role to play > here. > > >> Are there any scenarios around eg. BitTorrent that could be explored? > >> What if each of the static files in http://dbpedia.org/sitemap.xml > >> were available as torrents (or magnet: URIs)? > > > > When we set up the Descriptor Resource host, these would certainly be > > considered. > > Ok, let's take care to explore that then; it would probably help > others too. There must be dozens of companies and research > organizations who could put some bandwidth resources into this, if > only there was a short guide to setting up a GUI-less bittorrent tool > and configuring it appropriately. Are there any bittorrent experts on > these mailing lists who could suggest next practical steps here (not > necessarily dbpedia-specific)? > > (ah I see a reply from Ivan; copying it in here...) > > > If I were The Emperor of LOD I'd ask all grand dukes of datasources to > > put fresh dumps at some torrent with control of UL/DL ratio :) For > > reason I can't understand this idea is proposed few times per year but > > never tried. > > I suspect BitTorrent is in some ways somehow 'taboo' technology, since > it is most famous for being used to distributed materials that > copyright-owners often don't want distributed. I have no detailed idea > how torrent files are made, how trackers work, etc. I started poking > around magnet: a bit recently but haven't got a sense for how solid > that work is yet. Could a simple Wiki page be used for sharing > torrents? (plus published hash of files elsewhere for integrity > checks). What would it take to get started? > > Perhaps if http://wiki.dbpedia.org/Downloads35 had the sha1 for each > download published (rdfa?), then others could experiment with torrents > and downloaders could cross-check against an authoritative description > of the file from dbpedia? > > >> I realise that would > >> only address part of the problem/cost, but it's a widely used > >> technology for distributing large files; can we bend it to our needs? > >> > > > > Also, we encourage use of gzip over HTTP :-) > > Are there any RDF toolkits in need of a patch to their default setup > in this regard? Tutorials that need fixing, etc? > > cheers, > > Dan > > > ps. re big datasets, Library of Congress apparently are going to have > complete twitter archive - see > http://twitter.com/librarycongress/status/12172217971 -> > > http://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/ > > -- --- Daniel Koller Jahnstrasse 20 80469 München * dakoller@googlemail.com
Received on Monday, 19 April 2010 05:29:10 UTC