Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices from Kingsley Idehen on 2010-10-21 (semantic-web@w3.org from October 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 21 Oct 2010 15:45:21 -0400
To: Enrico Motta <e.motta@open.ac.uk>
CC: Chris Bizer <chris@bizer.de>, Martin Hepp <martin.hepp@ebusiness-unibw.org>, Thomas Steiner <tsteiner@google.com>, Semantic Web <semantic-web@w3.org>, public-lod <public-lod@w3.org>, Anja Jentzsch <anja@anjeve.de>, semanticweb <semanticweb@yahoogroups.com>, Giovanni Tummarello <giovanni.tummarello@deri.org>, Mathieu d'Aquin <m.daquin@open.ac.uk>
Message-ID: <4CC09851.7010101@openlinksw.com>
On 10/21/10 3:23 PM, Enrico Motta wrote:
> Chris
>
> I strongly agree with the points made by Martin and Giovanni.  Of 
> course the LOD initiative has had a lot of positive impact and you 
> cannot be blamed for being successful, but at the some time I am 
> worried that teh success and visibility of the LOD cloud is having 
> some rather serious negative consequences. Specifically:
>
> 1) lots of people, even within the SW community, now routinely 
> describe the LOD as the 'semantic web'.  This is not only dramatically 
> incorrect (and bad for students and people who want to know about the 
> SW) but also an obstacle to progress: anything which is not in the LOD 
> diagram does not exist, and this is really not good for the SW 
> community as a whole (including the people at the centre of the LOD 
> initiative).  Even worse, in the past 12-18 months  I have noticed 
> that this viewpoint has also been embraced by funding bodies and 
> linking to LOD is becoming a necessary condition for a SW project. 
> Again, I think this is undesirable - see also Martin's email on this 
> thread.

I agree, but do note (as per my earlier response) the success of the LOD 
cloud pictorial as marketing collateral isn't something that arisen by 
deliberate exclusion actions. Methinks many have simply slapped it into 
their presentations devoid of actual presentation goals. This single 
activity has helped and hurt the LOD cloud pictorial. Hurt meaning: 
creating the perception you describe above.

>
> 2) Because the LOD is perceived as the 'official SW' and because 
> resources in the LOD have to comply with a number of guidelines, 
> people also assume that LOD resources exhibit higher quality.

I hope not, and I don't think so. Even if it were to be true, would you 
blame the production of the pictorial for that? Really though, I don't 
recall anyone saying: LOD pictorial is the Linked Data gospel.

> Unfortunately in our experience this is not really the case, and this 
> also generates negative consequences. That is, if LOD is the 'official 
> high quality SW ' and there are so many issues with the data, 
> automatically people assume that the rest of the SW is a lot worse, 
> even though this is not necessarily the case.
>
> So, as other people have already said, maybe it is time to re-examine 
> teh design criteria for LOD and the way this is presented? 

But this should simple be a case of people from the community producing 
additional collateral. The LOD cloud has some interesting history that 
goes something like this:

1. Banff 2007 (Linked Data coming out party)  -- Chris was giving a 
DBpedia demo showing its inter-connectedness, TimBL then suggest to 
Chris to turn it into a cloud with periodic updates for demonstrating 
growth

2. Richard (working with Chris at the time) picked up the challenge and 
refined the initial graphic

3. People started using it to show growth of DBpedia which also implied 
LOD cloud since the connections in the pictorial were reciprocal

4. Cloud pictorial caught fire re. powerpoint presentations + 
exponential effect of slideshare.

Thus, why can others simply emulate this process, based on respective 
areas of interest?

> For instance, it would be beneficial to the community if LOD were to 
> focus more on quality issues, rather than linking for the sake of linking.

Who is this LOD entity? You make this entity sound very much like the 
one represented as a burning-bush when providing instructions Moses :-)

>   And in addition, a less static approach to listing resources could 
> improve the visibility of so much more stuff out there.

Yes, so no harm in making a real graph from the actual pool of linked 
data out in the wild.
>
>
> Enrico
>
> PS
>
>
>> I agree with you that it would be much better, if somebody would set 
>> up a
>> crawler, properly crawl the Web of Data and then provide a catalog 
>> about all
>> datasets.
>
> Actually this is exactly what our Watson system does, see 
> http://watson.kmi.open.ac.uk

And I would assume there are APIs or even a SPARQL endpoint that would 
enable interested parties generate a dynamic cloud, right?

Kingsley
>
>
>
> At 13:12 +0100 21/10/10, Giovanni Tummarello wrote:
>> > But again: I agree that crawling the Web of Data and then deriving 
>> a dataset
>>>  catalog as well as meta-data about the datasets directly from the 
>>> crawled
>>>  data would be clearly preferable and would also scale way better.
>>>
>>>  Thus: Could please somebody start a crawler and build such a catalog?
>>>
>>>  As long as nobody does this, I will keep on using CKAN.
>>>
>>
>> Hi Chris, all
>>
>> I can only restate that within Sindice we're very open to anyone who
>> wanted to develop data anlisys apps creating catalogs automatically.
>> At the moment a map reduce job a couple of week ago gave an excess of
>> 100k independent datasets. How many interlinked etc? to be analyzed.
>>
>> Our interest (and the interest of the Semantic Web vision i want to
>> sposor) is to make sure RDFa sites are fully included and so are those
>> who provide markup which can however be translated in an
>> automatic/agreeable way (so no scraping or "sponging") into RDF. (that
>> is anything that any23.org can turn into triples)
>>
>> If you were indeed interested in running your or developing your
>> algorithms in our running dataset no problem, the code can be made
>> opensource so it would run on others similarly structured datasets.
>>
>> This said yes i think too that in this phase a CKAN like repository
>> can be an interesting aggregation point, why not.
>>
>>  But i do think the diagram, which made great sense as an example when
>> Richard started it is now at risk of providing a disservice
>> which is in line which what Martin is making noticed.
>>
>> The diagram as it is now kinda implicitly conveys the sense that if
>> something is so large then all that matters must be there and that's
>> absolutely not the case.
>>
>> a) there are plenty of extremely useful datasets is RDF/RDFa etc which
>> are not there
>> b) the usefulness of being linked is all but a proven fact, so on the
>> one hand people might want to "be there" on the other you'd have to do
>> pushing toward serious commercial entities (for example) to "link to
>> dbpedia" for reasons that arent clear and that hurts your credibility.
>>
>> So danny ayers has fun linking to dbpedia so he is in there with his
>> joke dataset, but you cant credibly bring that argument to large
>> retailers so they're left out?
>>
>> this would be ok if the diagram was just "hey its my own thing i set
>> my rules" - fine but the fanfare around it gives it a different
>> meaning and thus the controversy above.
>>
>> .. just tried to put in words what might be a general unspoken feeling..
>>
>> Short message recap
>> a) ckan - nice why not might be useful but..
>> b) generated diagram : we have the data or can collect it so whoever
>> is interested in analitics pls let us know and we can work it out
>> (matter of fact it turns out most uf us in here are paid by EU for
>> doing this in collaborative projects :-) )
>>
>> cheers
>> Giovanni
>>
>>
>> -- 
>> The Open University is incorporated by Royal Charter (RC 000391), an 
>> exempt charity in England & Wales and a charity registered in 
>> Scotland (SC 038302).
>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Thursday, 21 October 2010 19:46:03 UTC