RE: Potential Home for LOD Data Sets

I found Medline to have a pretty nice model for this.  Every so often they
ship a full DB dump in XML as chunked zip files (not more than a 1Gb each if
I remember).  Subscribers just synchronize the FTP directories between the
Medline server and local server.  After that you can process daily diff
dumps. The downloads were just XML with a stream of record URIs with an
Add/Modify/Delete attribute, and the data fields that  changed.  A well
known graph where you can look for changes to the LOD datasources you care
about, and get SIOC markup for this that describes the Items, Date, and
Agents/People doing the modifications. This is a great use case for the
FOAF+SSL & OAuth because you may only automatically process updates from
Agents you trust (e.g. Wikipedia might only take changes from DBPedia).  

-Steve

-----Original Message-----
From: public-lod-request@w3.org [mailto:public-lod-request@w3.org] On Behalf
Of Kingsley Idehen
Sent: Monday, March 23, 2009 3:34 PM
To: Steve Judkins
Cc: 'Hugh Glaser'; public-lod@w3.org
Subject: Re: Potential Home for LOD Data Sets

Steve Judkins wrote:
> It seems like this has the potential to become a nice collaborative
> production pipeline. It would be nice to have a feed for data updates, so
we
> can fire up our EC2 instance when the data has been processed and packaged
> by the providers we are interested in.  For example, if Openlink wants to
> fire up their AMI to processes the raw dumps from
> http://wiki.dbpedia.org/Downloads32 into this cloud storage, we can wait
> until a virtuoso ready package has been produced before we update.  As
more
> agents get involved in processing the data, this will allow for more
> automation notifications of updated dumps or SPARQL endpoints.
>   
Yes, certainly.

Kingsley
> -Steve
>
> -----Original Message-----
> From: public-lod-request@w3.org [mailto:public-lod-request@w3.org] On
Behalf
> Of Kingsley Idehen
> Sent: Thursday, December 04, 2008 9:20 PM
> To: Hugh Glaser
> Cc: public-lod@w3.org
> Subject: Re: Potential Home for LOD Data Sets
>
>
> Hugh Glaser wrote:
>   
>> Thanks for the swift response!
>> I'm still puzzled - sorry to be slow.
>> http://aws.amazon.com/publicdatasets/#2
>> Says:
>> Amazon EC2 customers can access this data by creating their own personal
>>     
> Amazon EBS volumes, using the public data set snapshots as a starting
point.
> They can then access, modify and perform computation on these volumes
> directly using their Amazon EC2 instances and just pay for the compute and
> storage resources that they use.
>   
>>   
>> Does this not mean it costs me money on my EC2 account? Or is there some
>>     
> other way of accessing the data? Or am I looking at the wrong bit?
>   
>>   
>>     
> Okay, I see what I overlooked: the cost of paying for an AMI that mounts 
> these EBS volumes, even though Amazon is charging $0.00 for uploading 
> these huge amounts of data where it would usually charge.
>
> So to conclude, using the loaded data sets isn't free, but I think we 
> have to be somewhat appreciative of a value here, right? Amazon is 
> providing a service that is ultimately pegged to usage (utility model), 
> and the usage comes down to value associated with that scarce resource 
> called time.
>   
>> Ie Can you give me a clue how to get at the data without using my credit
>>     
> card please? :-)
>   
>>   
>>     
> You can't you will need someone to build an EC2 service for you and eat 
> the costs on your behalf. Of course such a service isn't impossible in a 
> "Numerati" [1] economy, but we aren't quite there yet, need the Linked 
> Data Web in place first :-)
>
> Links:
>
> 1. http://tinyurl.com/64gsan
>
> Kingsley
>   
>> Best
>> Hugh
>>
>> On 05/12/2008 02:28, "Kingsley Idehen" <kidehen@openlinksw.com> wrote:
>>
>>
>>
>> Hugh Glaser wrote:
>>   
>>     
>>> Exciting stuff, Kingsley.
>>> I'm not quite sure I have worked out how I might use it though.
>>> The page says that hosting data is clearly free, but I can't see how to
>>>       
> get at it without paying for it as an EC2 customer.
>   
>>> Is this right?
>>> Cheers
>>>
>>>     
>>>       
>> Hugh,
>>
>> No, shouldn't cost anything if the LOD data sets are hosted in this
>> particular location :-)
>>
>>
>> Kingsley
>>   
>>     
>>> Hugh
>>>
>>>
>>> On 01/12/2008 15:30, "Kingsley Idehen" <kidehen@openlinksw.com> wrote:
>>>
>>>
>>>
>>> All,
>>>
>>> Please see: <http://aws.amazon.com/publicdatasets/> ; potentially the
>>> final destination of all published RDF archives from the LOD cloud.
>>>
>>> I've already made a request on behalf of LOD, but additional requests
>>> from the community will accelerate the general comprehension and
>>> awareness at Amazon.
>>>
>>> Once the data sets are available from Amazon, database constructions
>>> costs will be significantly alleviated.
>>>
>>> We have DBpedia reconstruction down to 1.5 hrs (or less) based on
>>> Virtuoso's in-built integration with Amazon S3 for backup and
>>> restoration etc..  We could get the reconstruction of the entire LOD
>>> cloud down to some interesting numbers once all the data is situated in
>>> an Amazon data center.
>>>
>>>
>>> --
>>>
>>>
>>> Regards,
>>>
>>> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
>>> President & CEO
>>> OpenLink Software     Web: http://www.openlinksw.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>     
>>>       
>> --
>>
>>
>> Regards,
>>
>> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
>> President & CEO
>> OpenLink Software     Web: http://www.openlinksw.com
>>
>>
>>
>>
>>
>>
>>
>>   
>>     
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com

Received on Tuesday, 24 March 2009 07:54:53 UTC