W3C home > Mailing lists > Public > public-lod@w3.org > March 2009

Re: Potential Home for LOD Data Sets

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 23 Mar 2009 23:06:43 -0400
Message-ID: <49C84E43.4030606@openlinksw.com>
To: Steve Judkins <steve@wisdomnets.com>
CC: 'Hugh Glaser' <hg@ecs.soton.ac.uk>, public-lod@w3.org
Steve Judkins wrote:
> I found Medline to have a pretty nice model for this.  Every so often they
> ship a full DB dump in XML as chunked zip files (not more than a 1Gb each if
> I remember).  Subscribers just synchronize the FTP directories between the
> Medline server and local server.  After that you can process daily diff
> dumps. The downloads were just XML with a stream of record URIs with an
> Add/Modify/Delete attribute, and the data fields that  changed.  A well
> known graph where you can look for changes to the LOD datasources you care
> about, and get SIOC markup for this that describes the Items, Date, and
> Agents/People doing the modifications. This is a great use case for the
> FOAF+SSL & OAuth because you may only automatically process updates from
> Agents you trust (e.g. Wikipedia might only take changes from DBPedia).  
>   
Steve,

You're very much on the ball here, this is very much the kind of thing 
foaf+ssl [1] is about :-) I was going to unveil similar capabilities re. 
DBpedia endpoint down the line i.e. SPARQL endpoint behavior aligned to 
trusted identities etc..

Links:

1. http://esw.w3.org/topic/foaf+ssl - FOAF+SSL


Kingsley
> -Steve
>
> -----Original Message-----
> From: public-lod-request@w3.org [mailto:public-lod-request@w3.org] On Behalf
> Of Kingsley Idehen
> Sent: Monday, March 23, 2009 3:34 PM
> To: Steve Judkins
> Cc: 'Hugh Glaser'; public-lod@w3.org
> Subject: Re: Potential Home for LOD Data Sets
>
> Steve Judkins wrote:
>   
>> It seems like this has the potential to become a nice collaborative
>> production pipeline. It would be nice to have a feed for data updates, so
>>     
> we
>   
>> can fire up our EC2 instance when the data has been processed and packaged
>> by the providers we are interested in.  For example, if Openlink wants to
>> fire up their AMI to processes the raw dumps from
>> http://wiki.dbpedia.org/Downloads32 into this cloud storage, we can wait
>> until a virtuoso ready package has been produced before we update.  As
>>     
> more
>   
>> agents get involved in processing the data, this will allow for more
>> automation notifications of updated dumps or SPARQL endpoints.
>>   
>>     
> Yes, certainly.
>
> Kingsley
>   
>> -Steve
>>
>> -----Original Message-----
>> From: public-lod-request@w3.org [mailto:public-lod-request@w3.org] On
>>     
> Behalf
>   
>> Of Kingsley Idehen
>> Sent: Thursday, December 04, 2008 9:20 PM
>> To: Hugh Glaser
>> Cc: public-lod@w3.org
>> Subject: Re: Potential Home for LOD Data Sets
>>
>>
>> Hugh Glaser wrote:
>>   
>>     
>>> Thanks for the swift response!
>>> I'm still puzzled - sorry to be slow.
>>> http://aws.amazon.com/publicdatasets/#2
>>> Says:
>>> Amazon EC2 customers can access this data by creating their own personal
>>>     
>>>       
>> Amazon EBS volumes, using the public data set snapshots as a starting
>>     
> point.
>   
>> They can then access, modify and perform computation on these volumes
>> directly using their Amazon EC2 instances and just pay for the compute and
>> storage resources that they use.
>>   
>>     
>>>   
>>> Does this not mean it costs me money on my EC2 account? Or is there some
>>>     
>>>       
>> other way of accessing the data? Or am I looking at the wrong bit?
>>   
>>     
>>>   
>>>     
>>>       
>> Okay, I see what I overlooked: the cost of paying for an AMI that mounts 
>> these EBS volumes, even though Amazon is charging $0.00 for uploading 
>> these huge amounts of data where it would usually charge.
>>
>> So to conclude, using the loaded data sets isn't free, but I think we 
>> have to be somewhat appreciative of a value here, right? Amazon is 
>> providing a service that is ultimately pegged to usage (utility model), 
>> and the usage comes down to value associated with that scarce resource 
>> called time.
>>   
>>     
>>> Ie Can you give me a clue how to get at the data without using my credit
>>>     
>>>       
>> card please? :-)
>>   
>>     
>>>   
>>>     
>>>       
>> You can't you will need someone to build an EC2 service for you and eat 
>> the costs on your behalf. Of course such a service isn't impossible in a 
>> "Numerati" [1] economy, but we aren't quite there yet, need the Linked 
>> Data Web in place first :-)
>>
>> Links:
>>
>> 1. http://tinyurl.com/64gsan
>>
>> Kingsley
>>   
>>     
>>> Best
>>> Hugh
>>>
>>> On 05/12/2008 02:28, "Kingsley Idehen" <kidehen@openlinksw.com> wrote:
>>>
>>>
>>>
>>> Hugh Glaser wrote:
>>>   
>>>     
>>>       
>>>> Exciting stuff, Kingsley.
>>>> I'm not quite sure I have worked out how I might use it though.
>>>> The page says that hosting data is clearly free, but I can't see how to
>>>>       
>>>>         
>> get at it without paying for it as an EC2 customer.
>>   
>>     
>>>> Is this right?
>>>> Cheers
>>>>
>>>>     
>>>>       
>>>>         
>>> Hugh,
>>>
>>> No, shouldn't cost anything if the LOD data sets are hosted in this
>>> particular location :-)
>>>
>>>
>>> Kingsley
>>>   
>>>     
>>>       
>>>> Hugh
>>>>
>>>>
>>>> On 01/12/2008 15:30, "Kingsley Idehen" <kidehen@openlinksw.com> wrote:
>>>>
>>>>
>>>>
>>>> All,
>>>>
>>>> Please see: <http://aws.amazon.com/publicdatasets/> ; potentially the
>>>> final destination of all published RDF archives from the LOD cloud.
>>>>
>>>> I've already made a request on behalf of LOD, but additional requests
>>>> from the community will accelerate the general comprehension and
>>>> awareness at Amazon.
>>>>
>>>> Once the data sets are available from Amazon, database constructions
>>>> costs will be significantly alleviated.
>>>>
>>>> We have DBpedia reconstruction down to 1.5 hrs (or less) based on
>>>> Virtuoso's in-built integration with Amazon S3 for backup and
>>>> restoration etc..  We could get the reconstruction of the entire LOD
>>>> cloud down to some interesting numbers once all the data is situated in
>>>> an Amazon data center.
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
>>>> President & CEO
>>>> OpenLink Software     Web: http://www.openlinksw.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>     
>>>>       
>>>>         
>>> --
>>>
>>>
>>> Regards,
>>>
>>> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
>>> President & CEO
>>> OpenLink Software     Web: http://www.openlinksw.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   
>>>     
>>>       
>>   
>>     
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Tuesday, 24 March 2009 03:07:20 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:20 UTC