Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices from Enrico Motta on 2010-10-21 (semantic-web@w3.org from October 2010)

From: Enrico Motta <e.motta@open.ac.uk>
Date: Thu, 21 Oct 2010 20:23:38 +0100
To: Chris Bizer <chris@bizer.de>
Cc: Martin Hepp <martin.hepp@ebusiness-unibw.org>, Thomas Steiner <tsteiner@google.com>, Semantic Web <semantic-web@w3.org>, public-lod <public-lod@w3.org>, Anja Jentzsch <anja@anjeve.de>, semanticweb <semanticweb@yahoogroups.com>, Kingsley Idehen <kidehen@openlinksw.com>, Giovanni Tummarello <giovanni.tummarello@deri.org>, Mathieu d'Aquin <m.daquin@open.ac.uk>
Message-Id: <p06240809c8e63c9e3ef3@[192.168.0.4]>

Chris

I strongly agree with the points made by Martin and Giovanni.  Of 
course the LOD initiative has had a lot of positive impact and you 
cannot be blamed for being successful, but at the some time I am 
worried that teh success and visibility of the LOD cloud is having 
some rather serious negative consequences. Specifically:

1) lots of people, even within the SW community, now routinely 
describe the LOD as the 'semantic web'.  This is not only 
dramatically incorrect (and bad for students and people who want to 
know about the SW) but also an obstacle to progress: anything which 
is not in the LOD diagram does not exist, and this is really not good 
for the SW community as a whole (including the people at the centre 
of the LOD initiative).  Even worse, in the past 12-18 months  I have 
noticed that this viewpoint has also been embraced by funding bodies 
and linking to LOD is becoming a necessary condition for a SW 
project. Again, I think this is undesirable - see also Martin's email 
on this thread.

2) Because the LOD is perceived as the 'official SW' and because 
resources in the LOD have to comply with a number of guidelines, 
people also assume that LOD resources exhibit higher quality. 
Unfortunately in our experience this is not really the case, and this 
also generates negative consequences. That is, if LOD is the 
'official high quality SW ' and there are so many issues with the 
data, automatically people assume that the rest of the SW is a lot 
worse, even though this is not necessarily the case.

So, as other people have already said, maybe it is time to re-examine 
teh design criteria for LOD and the way this is presented?  For 
instance, it would be beneficial to the community if LOD were to 
focus more on quality issues, rather than linking for the sake of 
linking.  And in addition, a less static approach to listing 
resources could improve the visibility of so much more stuff out 
there.

Enrico

PS

>I agree with you that it would be much better, if somebody would set up a
>crawler, properly crawl the Web of Data and then provide a catalog about all
>datasets.

Actually this is exactly what our Watson system does, see 
http://watson.kmi.open.ac.uk

At 13:12 +0100 21/10/10, Giovanni Tummarello wrote:
>  > But again: I agree that crawling the Web of Data and then 
>deriving a dataset
>>  catalog as well as meta-data about the datasets directly from the crawled
>>  data would be clearly preferable and would also scale way better.
>>
>>  Thus: Could please somebody start a crawler and build such a catalog?
>>
>>  As long as nobody does this, I will keep on using CKAN.
>>
>
>Hi Chris, all
>
>I can only restate that within Sindice we're very open to anyone who
>wanted to develop data anlisys apps creating catalogs automatically.
>At the moment a map reduce job a couple of week ago gave an excess of
>100k independent datasets. How many interlinked etc? to be analyzed.
>
>Our interest (and the interest of the Semantic Web vision i want to
>sposor) is to make sure RDFa sites are fully included and so are those
>who provide markup which can however be translated in an
>automatic/agreeable way (so no scraping or "sponging") into RDF. (that
>is anything that any23.org can turn into triples)
>
>If you were indeed interested in running your or developing your
>algorithms in our running dataset no problem, the code can be made
>opensource so it would run on others similarly structured datasets.
>
>This said yes i think too that in this phase a CKAN like repository
>can be an interesting aggregation point, why not.
>
>  But i do think the diagram, which made great sense as an example when
>Richard started it is now at risk of providing a disservice
>which is in line which what Martin is making noticed.
>
>The diagram as it is now kinda implicitly conveys the sense that if
>something is so large then all that matters must be there and that's
>absolutely not the case.
>
>a) there are plenty of extremely useful datasets is RDF/RDFa etc which
>are not there
>b) the usefulness of being linked is all but a proven fact, so on the
>one hand people might want to "be there" on the other you'd have to do
>pushing toward serious commercial entities (for example) to "link to
>dbpedia" for reasons that arent clear and that hurts your credibility.
>
>So danny ayers has fun linking to dbpedia so he is in there with his
>joke dataset, but you cant credibly bring that argument to large
>retailers so they're left out?
>
>this would be ok if the diagram was just "hey its my own thing i set
>my rules" - fine but the fanfare around it gives it a different
>meaning and thus the controversy above.
>
>.. just tried to put in words what might be a general unspoken feeling..
>
>Short message recap
>a) ckan - nice why not might be useful but..
>b) generated diagram : we have the data or can collect it so whoever
>is interested in analitics pls let us know and we can work it out
>(matter of fact it turns out most uf us in here are paid by EU for
>doing this in collaborative projects :-) )
>
>cheers
>Giovanni
>
>
>--
>The Open University is incorporated by Royal Charter (RC 000391), an 
>exempt charity in England & Wales and a charity registered in 
>Scotland (SC 038302).

-- 

The Open University is incorporated by Royal Charter (RC 000391), an 
exempt charity in England & Wales and a charity registered in 
Scotland (SC 038302).

Received on Thursday, 21 October 2010 19:24:25 UTC