- From: Steve Judkins <steve@wisdomnets.com>
- Date: Mon, 23 Mar 2009 16:35:39 -0700
- To: "'Kingsley Idehen'" <kidehen@openlinksw.com>
- Cc: "'Hugh Glaser'" <hg@ecs.soton.ac.uk>, <public-lod@w3.org>
I found Medline to have a pretty nice model for this. Every so often they ship a full DB dump in XML as chunked zip files (not more than a 1Gb each if I remember). Subscribers just synchronize the FTP directories between the Medline server and local server. After that you can process daily diff dumps. The downloads were just XML with a stream of record URIs with an Add/Modify/Delete attribute, and the data fields that changed. A well known graph where you can look for changes to the LOD datasources you care about, and get SIOC markup for this that describes the Items, Date, and Agents/People doing the modifications. This is a great use case for the FOAF+SSL & OAuth because you may only automatically process updates from Agents you trust (e.g. Wikipedia might only take changes from DBPedia). -Steve -----Original Message----- From: public-lod-request@w3.org [mailto:public-lod-request@w3.org] On Behalf Of Kingsley Idehen Sent: Monday, March 23, 2009 3:34 PM To: Steve Judkins Cc: 'Hugh Glaser'; public-lod@w3.org Subject: Re: Potential Home for LOD Data Sets Steve Judkins wrote: > It seems like this has the potential to become a nice collaborative > production pipeline. It would be nice to have a feed for data updates, so we > can fire up our EC2 instance when the data has been processed and packaged > by the providers we are interested in. For example, if Openlink wants to > fire up their AMI to processes the raw dumps from > http://wiki.dbpedia.org/Downloads32 into this cloud storage, we can wait > until a virtuoso ready package has been produced before we update. As more > agents get involved in processing the data, this will allow for more > automation notifications of updated dumps or SPARQL endpoints. > Yes, certainly. Kingsley > -Steve > > -----Original Message----- > From: public-lod-request@w3.org [mailto:public-lod-request@w3.org] On Behalf > Of Kingsley Idehen > Sent: Thursday, December 04, 2008 9:20 PM > To: Hugh Glaser > Cc: public-lod@w3.org > Subject: Re: Potential Home for LOD Data Sets > > > Hugh Glaser wrote: > >> Thanks for the swift response! >> I'm still puzzled - sorry to be slow. >> http://aws.amazon.com/publicdatasets/#2 >> Says: >> Amazon EC2 customers can access this data by creating their own personal >> > Amazon EBS volumes, using the public data set snapshots as a starting point. > They can then access, modify and perform computation on these volumes > directly using their Amazon EC2 instances and just pay for the compute and > storage resources that they use. > >> >> Does this not mean it costs me money on my EC2 account? Or is there some >> > other way of accessing the data? Or am I looking at the wrong bit? > >> >> > Okay, I see what I overlooked: the cost of paying for an AMI that mounts > these EBS volumes, even though Amazon is charging $0.00 for uploading > these huge amounts of data where it would usually charge. > > So to conclude, using the loaded data sets isn't free, but I think we > have to be somewhat appreciative of a value here, right? Amazon is > providing a service that is ultimately pegged to usage (utility model), > and the usage comes down to value associated with that scarce resource > called time. > >> Ie Can you give me a clue how to get at the data without using my credit >> > card please? :-) > >> >> > You can't you will need someone to build an EC2 service for you and eat > the costs on your behalf. Of course such a service isn't impossible in a > "Numerati" [1] economy, but we aren't quite there yet, need the Linked > Data Web in place first :-) > > Links: > > 1. http://tinyurl.com/64gsan > > Kingsley > >> Best >> Hugh >> >> On 05/12/2008 02:28, "Kingsley Idehen" <kidehen@openlinksw.com> wrote: >> >> >> >> Hugh Glaser wrote: >> >> >>> Exciting stuff, Kingsley. >>> I'm not quite sure I have worked out how I might use it though. >>> The page says that hosting data is clearly free, but I can't see how to >>> > get at it without paying for it as an EC2 customer. > >>> Is this right? >>> Cheers >>> >>> >>> >> Hugh, >> >> No, shouldn't cost anything if the LOD data sets are hosted in this >> particular location :-) >> >> >> Kingsley >> >> >>> Hugh >>> >>> >>> On 01/12/2008 15:30, "Kingsley Idehen" <kidehen@openlinksw.com> wrote: >>> >>> >>> >>> All, >>> >>> Please see: <http://aws.amazon.com/publicdatasets/> ; potentially the >>> final destination of all published RDF archives from the LOD cloud. >>> >>> I've already made a request on behalf of LOD, but additional requests >>> from the community will accelerate the general comprehension and >>> awareness at Amazon. >>> >>> Once the data sets are available from Amazon, database constructions >>> costs will be significantly alleviated. >>> >>> We have DBpedia reconstruction down to 1.5 hrs (or less) based on >>> Virtuoso's in-built integration with Amazon S3 for backup and >>> restoration etc.. We could get the reconstruction of the entire LOD >>> cloud down to some interesting numbers once all the data is situated in >>> an Amazon data center. >>> >>> >>> -- >>> >>> >>> Regards, >>> >>> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen >>> President & CEO >>> OpenLink Software Web: http://www.openlinksw.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> -- >> >> >> Regards, >> >> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen >> President & CEO >> OpenLink Software Web: http://www.openlinksw.com >> >> >> >> >> >> >> >> >> > > > -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Tuesday, 24 March 2009 07:54:53 UTC