- From: Giovanni Tummarello <giovanni.tummarello@deri.org>
- Date: Thu, 21 Oct 2010 13:12:10 +0100
- To: Chris Bizer <chris@bizer.de>
- Cc: Martin Hepp <martin.hepp@ebusiness-unibw.org>, Thomas Steiner <tsteiner@google.com>, Semantic Web <semantic-web@w3.org>, public-lod <public-lod@w3.org>, Anja Jentzsch <anja@anjeve.de>, semanticweb <semanticweb@yahoogroups.com>, Kingsley Idehen <kidehen@openlinksw.com>
> But again: I agree that crawling the Web of Data and then deriving a dataset > catalog as well as meta-data about the datasets directly from the crawled > data would be clearly preferable and would also scale way better. > > Thus: Could please somebody start a crawler and build such a catalog? > > As long as nobody does this, I will keep on using CKAN. > Hi Chris, all I can only restate that within Sindice we're very open to anyone who wanted to develop data anlisys apps creating catalogs automatically. At the moment a map reduce job a couple of week ago gave an excess of 100k independent datasets. How many interlinked etc? to be analyzed. Our interest (and the interest of the Semantic Web vision i want to sposor) is to make sure RDFa sites are fully included and so are those who provide markup which can however be translated in an automatic/agreeable way (so no scraping or "sponging") into RDF. (that is anything that any23.org can turn into triples) If you were indeed interested in running your or developing your algorithms in our running dataset no problem, the code can be made opensource so it would run on others similarly structured datasets. This said yes i think too that in this phase a CKAN like repository can be an interesting aggregation point, why not. But i do think the diagram, which made great sense as an example when Richard started it is now at risk of providing a disservice which is in line which what Martin is making noticed. The diagram as it is now kinda implicitly conveys the sense that if something is so large then all that matters must be there and that's absolutely not the case. a) there are plenty of extremely useful datasets is RDF/RDFa etc which are not there b) the usefulness of being linked is all but a proven fact, so on the one hand people might want to "be there" on the other you'd have to do pushing toward serious commercial entities (for example) to "link to dbpedia" for reasons that arent clear and that hurts your credibility. So danny ayers has fun linking to dbpedia so he is in there with his joke dataset, but you cant credibly bring that argument to large retailers so they're left out? this would be ok if the diagram was just "hey its my own thing i set my rules" - fine but the fanfare around it gives it a different meaning and thus the controversy above. .. just tried to put in words what might be a general unspoken feeling.. Short message recap a) ckan - nice why not might be useful but.. b) generated diagram : we have the data or can collect it so whoever is interested in analitics pls let us know and we can work it out (matter of fact it turns out most uf us in here are paid by EU for doing this in collaborative projects :-) ) cheers Giovanni
Received on Thursday, 21 October 2010 12:12:41 UTC