Re: [HELP] Can you please update information about your dataset? from Kingsley Idehen on 2009-08-12 (public-lod@w3.org from August 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 12 Aug 2009 08:23:09 -0400
To: Richard Cyganiak <richard@cyganiak.de>
CC: Hugh Glaser <hg@ecs.soton.ac.uk>, Aldo Bucchi <aldo.bucchi@gmail.com>, Leigh Dodds <leigh.dodds@talis.com>, Jun Zhao <jun.zhao@zoo.ox.ac.uk>, "public-lod@w3.org" <public-lod@w3.org>, Anja Jentzsch <anja@anjeve.de>, Story Henry <henry.story@bblfish.net>
Message-ID: <4A82B42D.7040307@openlinksw.com>
Richard Cyganiak wrote:
> The problem at hand is: How to get reasonably accurate and up-to-date 
> statistics about the LOD cloud?
>
> I see three workable methods for this.
>
> 1. Compile the statistics from voiD descriptions published by 
> individual dataset maintainers. This is what Hugh proposes below. 
> Enabling this is one of the main reason why we created voiD. There has 
> to be better tools for creating voiD before this happens. The tools 
> could be, for example, manual entry forms that spit out voiD 
> (voiD-o-matic?), or analyzers that read a dump and spit out a skeleton 
> voiD file.
+1

We do the above, but it means you data store has to be Virtuoso.

In Virtuoso we can generate VoiD by click button for data in the Quad Store.

We also have a Meta Cartridge for the Sponger that adds VoiD data to 
information resource that's RDFized.
>
> 2. Hand-compile the statistics by watching public-lod, trawling 
> project home pages, emailing dataset maintainers, and fixing things 
> when dataset maintainers complain. This is how I created the original 
> LOD cloud diagram in Berlin, and after I left Berlin, Anja has done a 
> great job keeping it up to date despite its massive growth. We will 
> continue to update it on a best-effort basis for the foreseeable 
> future. A voiD version of the information underlying the diagram is in 
> the pipeline. Others can do as we did.
>
> 3. Anyone who has a copy of a big part of the cloud (e.g. OpenLink and 
> we at Sindice) can potentially calculate the statistics. This is 
> non-trivial because we just have triples, and we need to 
> reverse-engineer datasets and linksets from them, it involves 
> computation over quite serious amounts of data, and in the end you 
> still won't have good labels or homepages for the datasets. While this 
> approach is possible, it seems to me that there are better uses of 
> engineering and research resources.
Yep!
>
> There is a fourth process that, IMO, does NOT work:
>
> 4. Send an email to public-lod asking "Everyone please enter your 
> dataset in this wikipage/GoogleSpreadsheet/fancyAppOfTheWeek."
We can have a shared Google spreadsheet that replaces the current ESW 
Wiki table.

We might even have a segue here for Google appreciate FOAF+SSL, then we 
can leverage FOAF graphs for spreadsheet access control policies :-)
I also see this as a nice openning to reverse sponging/rdfization 
whereby a simply form fronts the API for writing to the Google 
spreadsheets. Basically, what RDF Pushback [1] is all about.


Links:

1. http://esw.w3.org/topic/PushBackDataToLegacySources

Kingsley
>
> Best,
> Richard
>
>
> On 11 Aug 2009, at 22:07, Hugh Glaser wrote:
>> If any more work is to be put into generating this picture, it really 
>> should be from voiD descriptions, which we already make available for 
>> all our datasets.
>> And for those who want to do it by hand, a simple system to allow 
>> them to specify the linkage using voiD would get the entry into a 
>> format for the voiD processor to use (I'm happy to host the data if 
>> need be).
>
>> Or Aldo's system could generate its RDF using the voiD ontology, thus 
>> providing the manual entry system?
>>
>> I know we have been here before, and almost got to the voiD processor 
>> thing:- please can we try again?
>>
>> Best
>> Hugh
>>
>> On 11/08/2009 19:00, "Aldo Bucchi" <aldo.bucchi@gmail.com> wrote:
>>
>> Hi,
>>
>> On Aug 11, 2009, at 13:46, Kingsley Idehen <kidehen@openlinksw.com>
>> wrote:
>>
>>> Leigh Dodds wrote:
>>>> Hi,
>>>>
>>>> I've just added several new datasets to the Statistics page that
>>>> weren't previously listed. Its not really a great user experience
>>>> editing the wiki markup and manually adding up the figures.
>>>>
>>>> So, thinking out loud, I'm wondering whether it might be more
>>>> appropriate to use a Google spreadsheet and one of their submission
>>>> forms for the purposes of collectively the data. A little manual
>>>> editing to remove duplicates might make managing this data a little
>>>> more easier. Especially as there are also pages that separately list
>>>> the available SPARQL endpoints and RDF dumps.
>>>>
>>>> I'm sure we could create something much better using Void, etc but
>>>> for
>>>> now, maybe using a slightly better tool would give us a little more
>>>> progress? It'd be a snip to dump out the Google Spreadsheet data
>>>> programmatically too, which'd be another improvement on the current
>>>> situation.
>>>>
>>>> What does everyone else think?
>>>>
>>> Nice Idea! Especially as Google Spreadsheet to RDF is just about
>>> RDFizers for the Google Spreadsheet API :-)
>>
>> Hehe. I have this in my todo (literally). A website that exposes a
>> google spreadsheet as SPARQL endpoint. Internally we use it as UI to
>> quickly create config files et Al.
>> But It will remain in my todo forever...;)
>>
>> Kingsley, this could be sponged. The trick is that the spreadsheet
>> must have an accompanying page/sheet/book with metadata (the NS or
>> explicit URIs for cols).
>>
>>>
>>> Kingsley
>>>> Cheers,
>>>>
>>>> L.
>>>>
>>>> 2009/8/7 Jun Zhao <jun.zhao@zoo.ox.ac.uk>:
>>>>
>>>>> Dear all,
>>>>>
>>>>> We are planning to produce an updated data cloud diagram based on
>>>>> the
>>>>> dataset information on the esw wiki page:
>>>>> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics 
>>>>>
>>>>>
>>>>> If you have not published your dataset there yet and you would
>>>>> like your
>>>>> dataset to be included, can you please add your dataset there?
>>>>>
>>>>> If you have an entry there for your dataset already, can you
>>>>> please update
>>>>> information about your dataset on the wiki?
>>>>>
>>>>> If you cannot edit the wiki page any more because the recent
>>>>> update of esw
>>>>> wiki editing policy, you can send the information to me or Anja,
>>>>> who is
>>>>> cc'ed. We can update it for you.
>>>>>
>>>>> If you know your friends have dataset on the wiki, but are not on
>>>>> the
>>>>> mailing list, can you please kindly forward this email to them? We
>>>>> would
>>>>> like to get the data cloud as up-to-date as possible.
>>>>>
>>>>> For this release, we will use the above wiki page as the information
>>>>> gathering point. We do apologize if you have published information
>>>>> about
>>>>> your dataset on other web pages and this request would mean extra
>>>>> work for
>>>>> you.
>>>>>
>>>>> Many thanks for your contributions!
>>>>>
>>>>> Kindest regards,
>>>>>
>>>>> Jun
>>>>>
>>>>>
>>>>> ______________________________________________________________________ 
>>>>>
>>
>>
>>>>> This email has been scanned by the MessageLabs Email Security
>>>>> System.
>>>>> For more information please visit http://www.messagelabs.com/email
>>>>> ______________________________________________________________________ 
>>>>>
>>
>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> -- 
>>>
>>>
>>> Regards,
>>>
>>> Kingsley Idehen          Weblog: 
>>> http://www.openlinksw.com/blog/~kidehen
>>> President & CEO OpenLink Software     Web: http://www.openlinksw.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Wednesday, 12 August 2009 12:23:57 UTC