Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices

On 10/21/10 8:12 AM, Giovanni Tummarello wrote:
>> But again: I agree that crawling the Web of Data and then deriving a dataset
>> catalog as well as meta-data about the datasets directly from the crawled
>> data would be clearly preferable and would also scale way better.
>> Thus: Could please somebody start a crawler and build such a catalog?
>> As long as nobody does this, I will keep on using CKAN.
> Hi Chris, all
> I can only restate that within Sindice we're very open to anyone who
> wanted to develop data anlisys apps creating catalogs automatically.
> At the moment a map reduce job a couple of week ago gave an excess of
> 100k independent datasets. How many interlinked etc? to be analyzed.
> Our interest (and the interest of the Semantic Web vision i want to
> sposor) is to make sure RDFa sites are fully included and so are those
> who provide markup which can however be translated in an
> automatic/agreeable way (so no scraping or "sponging") into RDF. (that
> is anything that can turn into triples)
> If you were indeed interested in running your or developing your
> algorithms in our running dataset no problem, the code can be made
> opensource so it would run on others similarly structured datasets.
> This said yes i think too that in this phase a CKAN like repository
> can be an interesting aggregation point, why not.
>   But i do think the diagram, which made great sense as an example when
> Richard started it is now at risk of providing a disservice
> which is in line which what Martin is making noticed.
> The diagram as it is now kinda implicitly conveys the sense that if
> something is so large then all that matters must be there and that's
> absolutely not the case.
> a) there are plenty of extremely useful datasets is RDF/RDFa etc which
> are not there
> b) the usefulness of being linked is all but a proven fact, so on the
> one hand people might want to "be there" on the other you'd have to do
> pushing toward serious commercial entities (for example) to "link to
> dbpedia" for reasons that arent clear and that hurts your credibility.
> So danny ayers has fun linking to dbpedia so he is in there with his
> joke dataset, but you cant credibly bring that argument to large
> retailers so they're left out?
> this would be ok if the diagram was just "hey its my own thing i set
> my rules" - fine but the fanfare around it gives it a different
> meaning and thus the controversy above.
> .. just tried to put in words what might be a general unspoken feeling..
> Short message recap
> a) ckan - nice why not might be useful but..
> b) generated diagram : we have the data or can collect it so whoever
> is interested in analitics pls let us know and we can work it out
> (matter of fact it turns out most uf us in here are paid by EU for
> doing this in collaborative projects :-) )
> cheers
> Giovanni

I would encourage you to do the following:

1. Act on your suggestion re. data access and analysis tools (you've 
reiterated a number of very important points)
2. Encourage other producers and curators of Linked Data to take 
advantage of #1 .

The LOD pictorial serves a specific purpose (rough cloud growth 
accounting), but in no way is it canonical re. burgeoning Web of Linked 
Data concept. Thus, others should chip in and make alternative 
pictorials (hand crafted or generated from datasets) that contribute to 
the bigger picture of Linked Data meme clarity (not *unclarity* ).

I don't recall anyone saying: when presenting to potential users 
(individuals or organizations), use the LOD cloud pictorial as the 
clincher re. value proposition articulation :-) I think people have 
simply picked up the LOD pictorial and lazily incorporated into their 
presentations thereby creating a detrimental illusion (IMHO).

Here are some pictorials off the top of my head that need to materialize 
or be updated:

1. Data Dictionary Cloud -- show the Conceptual Schema dimension of 
Linked Data across FOAF, GoodRelations, DOAP, Bibo, Music Ontology etc..

2. Profile Cloud -- show the various profile Linked Data spaces based on 

3. Software Projects Cloud -- DOAP instances

4. Music Cloud -- Music Ontology instances

5. etc..



Kingsley Idehen	
President&  CEO
OpenLink Software
Twitter/ kidehen

Received on Thursday, 21 October 2010 12:46:13 UTC