- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Wed, 12 Aug 2009 08:23:09 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- CC: Hugh Glaser <hg@ecs.soton.ac.uk>, Aldo Bucchi <aldo.bucchi@gmail.com>, Leigh Dodds <leigh.dodds@talis.com>, Jun Zhao <jun.zhao@zoo.ox.ac.uk>, "public-lod@w3.org" <public-lod@w3.org>, Anja Jentzsch <anja@anjeve.de>, Story Henry <henry.story@bblfish.net>
Richard Cyganiak wrote: > The problem at hand is: How to get reasonably accurate and up-to-date > statistics about the LOD cloud? > > I see three workable methods for this. > > 1. Compile the statistics from voiD descriptions published by > individual dataset maintainers. This is what Hugh proposes below. > Enabling this is one of the main reason why we created voiD. There has > to be better tools for creating voiD before this happens. The tools > could be, for example, manual entry forms that spit out voiD > (voiD-o-matic?), or analyzers that read a dump and spit out a skeleton > voiD file. +1 We do the above, but it means you data store has to be Virtuoso. In Virtuoso we can generate VoiD by click button for data in the Quad Store. We also have a Meta Cartridge for the Sponger that adds VoiD data to information resource that's RDFized. > > 2. Hand-compile the statistics by watching public-lod, trawling > project home pages, emailing dataset maintainers, and fixing things > when dataset maintainers complain. This is how I created the original > LOD cloud diagram in Berlin, and after I left Berlin, Anja has done a > great job keeping it up to date despite its massive growth. We will > continue to update it on a best-effort basis for the foreseeable > future. A voiD version of the information underlying the diagram is in > the pipeline. Others can do as we did. > > 3. Anyone who has a copy of a big part of the cloud (e.g. OpenLink and > we at Sindice) can potentially calculate the statistics. This is > non-trivial because we just have triples, and we need to > reverse-engineer datasets and linksets from them, it involves > computation over quite serious amounts of data, and in the end you > still won't have good labels or homepages for the datasets. While this > approach is possible, it seems to me that there are better uses of > engineering and research resources. Yep! > > There is a fourth process that, IMO, does NOT work: > > 4. Send an email to public-lod asking "Everyone please enter your > dataset in this wikipage/GoogleSpreadsheet/fancyAppOfTheWeek." We can have a shared Google spreadsheet that replaces the current ESW Wiki table. We might even have a segue here for Google appreciate FOAF+SSL, then we can leverage FOAF graphs for spreadsheet access control policies :-) I also see this as a nice openning to reverse sponging/rdfization whereby a simply form fronts the API for writing to the Google spreadsheets. Basically, what RDF Pushback [1] is all about. Links: 1. http://esw.w3.org/topic/PushBackDataToLegacySources Kingsley > > Best, > Richard > > > On 11 Aug 2009, at 22:07, Hugh Glaser wrote: >> If any more work is to be put into generating this picture, it really >> should be from voiD descriptions, which we already make available for >> all our datasets. >> And for those who want to do it by hand, a simple system to allow >> them to specify the linkage using voiD would get the entry into a >> format for the voiD processor to use (I'm happy to host the data if >> need be). > >> Or Aldo's system could generate its RDF using the voiD ontology, thus >> providing the manual entry system? >> >> I know we have been here before, and almost got to the voiD processor >> thing:- please can we try again? >> >> Best >> Hugh >> >> On 11/08/2009 19:00, "Aldo Bucchi" <aldo.bucchi@gmail.com> wrote: >> >> Hi, >> >> On Aug 11, 2009, at 13:46, Kingsley Idehen <kidehen@openlinksw.com> >> wrote: >> >>> Leigh Dodds wrote: >>>> Hi, >>>> >>>> I've just added several new datasets to the Statistics page that >>>> weren't previously listed. Its not really a great user experience >>>> editing the wiki markup and manually adding up the figures. >>>> >>>> So, thinking out loud, I'm wondering whether it might be more >>>> appropriate to use a Google spreadsheet and one of their submission >>>> forms for the purposes of collectively the data. A little manual >>>> editing to remove duplicates might make managing this data a little >>>> more easier. Especially as there are also pages that separately list >>>> the available SPARQL endpoints and RDF dumps. >>>> >>>> I'm sure we could create something much better using Void, etc but >>>> for >>>> now, maybe using a slightly better tool would give us a little more >>>> progress? It'd be a snip to dump out the Google Spreadsheet data >>>> programmatically too, which'd be another improvement on the current >>>> situation. >>>> >>>> What does everyone else think? >>>> >>> Nice Idea! Especially as Google Spreadsheet to RDF is just about >>> RDFizers for the Google Spreadsheet API :-) >> >> Hehe. I have this in my todo (literally). A website that exposes a >> google spreadsheet as SPARQL endpoint. Internally we use it as UI to >> quickly create config files et Al. >> But It will remain in my todo forever...;) >> >> Kingsley, this could be sponged. The trick is that the spreadsheet >> must have an accompanying page/sheet/book with metadata (the NS or >> explicit URIs for cols). >> >>> >>> Kingsley >>>> Cheers, >>>> >>>> L. >>>> >>>> 2009/8/7 Jun Zhao <jun.zhao@zoo.ox.ac.uk>: >>>> >>>>> Dear all, >>>>> >>>>> We are planning to produce an updated data cloud diagram based on >>>>> the >>>>> dataset information on the esw wiki page: >>>>> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics >>>>> >>>>> >>>>> If you have not published your dataset there yet and you would >>>>> like your >>>>> dataset to be included, can you please add your dataset there? >>>>> >>>>> If you have an entry there for your dataset already, can you >>>>> please update >>>>> information about your dataset on the wiki? >>>>> >>>>> If you cannot edit the wiki page any more because the recent >>>>> update of esw >>>>> wiki editing policy, you can send the information to me or Anja, >>>>> who is >>>>> cc'ed. We can update it for you. >>>>> >>>>> If you know your friends have dataset on the wiki, but are not on >>>>> the >>>>> mailing list, can you please kindly forward this email to them? We >>>>> would >>>>> like to get the data cloud as up-to-date as possible. >>>>> >>>>> For this release, we will use the above wiki page as the information >>>>> gathering point. We do apologize if you have published information >>>>> about >>>>> your dataset on other web pages and this request would mean extra >>>>> work for >>>>> you. >>>>> >>>>> Many thanks for your contributions! >>>>> >>>>> Kindest regards, >>>>> >>>>> Jun >>>>> >>>>> >>>>> ______________________________________________________________________ >>>>> >> >> >>>>> This email has been scanned by the MessageLabs Email Security >>>>> System. >>>>> For more information please visit http://www.messagelabs.com/email >>>>> ______________________________________________________________________ >>>>> >> >> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> >>> >>> Regards, >>> >>> Kingsley Idehen Weblog: >>> http://www.openlinksw.com/blog/~kidehen >>> President & CEO OpenLink Software Web: http://www.openlinksw.com >>> >>> >>> >>> >>> >> >> >> > > -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Wednesday, 12 August 2009 12:23:57 UTC