Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices from Jiří Procházka on 2010-10-21 (semantic-web@w3.org from October 2010)

From: Jiří Procházka <ojirio@gmail.com>
Date: Thu, 21 Oct 2010 22:49:58 +0200
To: Enrico Motta <e.motta@open.ac.uk>
CC: Chris Bizer <chris@bizer.de>, Martin Hepp <martin.hepp@ebusiness-unibw.org>, Thomas Steiner <tsteiner@google.com>, Semantic Web <semantic-web@w3.org>, public-lod <public-lod@w3.org>, Anja Jentzsch <anja@anjeve.de>, semanticweb <semanticweb@yahoogroups.com>, Kingsley Idehen <kidehen@openlinksw.com>, Giovanni Tummarello <giovanni.tummarello@deri.org>, Mathieu d'Aquin <m.daquin@open.ac.uk>
Message-ID: <4CC0A776.2030208@gmail.com>

Hi everyone,
I think it is important not to forget that semantic web goal of creating
a unified model for information exchange in decentralized heterogeneous
network of systems, aiming for the lowest common denominator, implies
many requirements for data quality will not be met, because simply for
various people they are different. It is a matter of paradigm - way of
working with the data, so it should come as no surprise various groups
of alike thinking people define their requirements, especially in area
of discoverability.

I find it quite surprising that no more of such standards as Linked Data
and LOD exist. Perhaps once more of them exist, community tracking and
comparison to be included in semantic web introduction materials would
help proliferation of more accurate image of semantic web...

Of course it would be great if information about complying data of such
initiatives would be generated by automated tools (no "Submit URL"
please), as example the application of the data discoverability
algorithm they endorse (not sure if LD has something like this -
follow-your-nose?), if discoverability is in their focus.

Best,
Jiri Prochazka

On 10/21/2010 09:23 PM, Enrico Motta wrote:
> Chris
> 
> I strongly agree with the points made by Martin and Giovanni.  Of course
> the LOD initiative has had a lot of positive impact and you cannot be
> blamed for being successful, but at the some time I am worried that teh
> success and visibility of the LOD cloud is having some rather serious
> negative consequences. Specifically:
> 
> 1) lots of people, even within the SW community, now routinely describe
> the LOD as the 'semantic web'.  This is not only dramatically incorrect
> (and bad for students and people who want to know about the SW) but also
> an obstacle to progress: anything which is not in the LOD diagram does
> not exist, and this is really not good for the SW community as a whole
> (including the people at the centre of the LOD initiative).  Even worse,
> in the past 12-18 months  I have noticed that this viewpoint has also
> been embraced by funding bodies and linking to LOD is becoming a
> necessary condition for a SW project. Again, I think this is undesirable
> - see also Martin's email on this thread.
> 
> 2) Because the LOD is perceived as the 'official SW' and because
> resources in the LOD have to comply with a number of guidelines, people
> also assume that LOD resources exhibit higher quality. Unfortunately in
> our experience this is not really the case, and this also generates
> negative consequences. That is, if LOD is the 'official high quality SW
> ' and there are so many issues with the data, automatically people
> assume that the rest of the SW is a lot worse, even though this is not
> necessarily the case.
> 
> So, as other people have already said, maybe it is time to re-examine
> teh design criteria for LOD and the way this is presented?  For
> instance, it would be beneficial to the community if LOD were to focus
> more on quality issues, rather than linking for the sake of linking. 
> And in addition, a less static approach to listing resources could
> improve the visibility of so much more stuff out there.
> 
> 
> Enrico
> 
> PS
> 
> 
>> I agree with you that it would be much better, if somebody would set up a
>> crawler, properly crawl the Web of Data and then provide a catalog
>> about all
>> datasets.
> 
> Actually this is exactly what our Watson system does, see
> http://watson.kmi.open.ac.uk
> 
> 
> 
> At 13:12 +0100 21/10/10, Giovanni Tummarello wrote:
>>  > But again: I agree that crawling the Web of Data and then deriving
>> a dataset
>>>  catalog as well as meta-data about the datasets directly from the
>>> crawled
>>>  data would be clearly preferable and would also scale way better.
>>>
>>>  Thus: Could please somebody start a crawler and build such a catalog?
>>>
>>>  As long as nobody does this, I will keep on using CKAN.
>>>
>>
>> Hi Chris, all
>>
>> I can only restate that within Sindice we're very open to anyone who
>> wanted to develop data anlisys apps creating catalogs automatically.
>> At the moment a map reduce job a couple of week ago gave an excess of
>> 100k independent datasets. How many interlinked etc? to be analyzed.
>>
>> Our interest (and the interest of the Semantic Web vision i want to
>> sposor) is to make sure RDFa sites are fully included and so are those
>> who provide markup which can however be translated in an
>> automatic/agreeable way (so no scraping or "sponging") into RDF. (that
>> is anything that any23.org can turn into triples)
>>
>> If you were indeed interested in running your or developing your
>> algorithms in our running dataset no problem, the code can be made
>> opensource so it would run on others similarly structured datasets.
>>
>> This said yes i think too that in this phase a CKAN like repository
>> can be an interesting aggregation point, why not.
>>
>>  But i do think the diagram, which made great sense as an example when
>> Richard started it is now at risk of providing a disservice
>> which is in line which what Martin is making noticed.
>>
>> The diagram as it is now kinda implicitly conveys the sense that if
>> something is so large then all that matters must be there and that's
>> absolutely not the case.
>>
>> a) there are plenty of extremely useful datasets is RDF/RDFa etc which
>> are not there
>> b) the usefulness of being linked is all but a proven fact, so on the
>> one hand people might want to "be there" on the other you'd have to do
>> pushing toward serious commercial entities (for example) to "link to
>> dbpedia" for reasons that arent clear and that hurts your credibility.
>>
>> So danny ayers has fun linking to dbpedia so he is in there with his
>> joke dataset, but you cant credibly bring that argument to large
>> retailers so they're left out?
>>
>> this would be ok if the diagram was just "hey its my own thing i set
>> my rules" - fine but the fanfare around it gives it a different
>> meaning and thus the controversy above.
>>
>> .. just tried to put in words what might be a general unspoken feeling..
>>
>> Short message recap
>> a) ckan - nice why not might be useful but..
>> b) generated diagram : we have the data or can collect it so whoever
>> is interested in analitics pls let us know and we can work it out
>> (matter of fact it turns out most uf us in here are paid by EU for
>> doing this in collaborative projects :-) )
>>
>> cheers
>> Giovanni
>>
>>
>> -- 
>> The Open University is incorporated by Royal Charter (RC 000391), an
>> exempt charity in England & Wales and a charity registered in Scotland
>> (SC 038302).
> 
>

Received on Thursday, 21 October 2010 20:50:36 UTC