Analyzing the success of LOD (was: New LOD Cloud - Please send us links to missing data sources) from Matthias Samwald on 2009-03-02 (public-lod@w3.org from March 2009)

From: Matthias Samwald <samwald@gmx.at>
Date: Mon, 2 Mar 2009 12:03:15 +0100
To: <public-lod@w3.org>, "Giovanni Tummarello" <giovanni.tummarello@deri.org>, "Andraz Tori" <andraz@zemanta.com>
Message-ID: <9C33EF9CF3D946A79CA0E1F8339FCEA9@ms>
Andraz:
>> That the bubbles continue to grown is however a sociological
>> interesting phenomen :-)
>> And a good sign that something has gone right :)

Giovanni:
> Maybe :-) but people do things for many other reason that "they're right".

I think the LOD project is a great success. It is a very lively community, 
there has been significant progress over the last year (amount of data, 
quality of underlying technologies such as Virtuoso). However, the community 
should take some time to analyze WHY it is successful, and why it is more 
successful than attempts of using RDF/OWL before 2007. Some thoughts on 
this:

* The main ingredient to the success of LOD is that it is relatively 
centralized. It would not work without DBpedia serving as the 'nucleus' of 
the cloud. It would not work without someone dedicated to drawing the clould 
diagram that everyone is happy to show on Powerpoint slides. It would not 
work without this mailing list that serves an open platform for the 
community. However, I have the impression that some key persons in the LOD 
community might not be happy about this reason for success at all. For them, 
the LOD project is a mere testing ground for the next generation of the 
entire web, and showing that linked data works in a decentralized way is a 
crucial aspect of this vision. The fact that the current LOD cloud was 
actually produced in a rather centralized process, and that most of the 
valuable data sources in the LOD cloud are actually under the control of a 
very small number of stakeholders, is seen as a transient blemish, at best.
However, I think that this is a problematic situation, and we should embrace 
the semi-centralized nature of the LOD project, rather than hiding it away. 
Having a close-knit group of stakeholders that contribute to a partly 
distributed, partly centralized knowledge base might actually be a very 
interesting endeavor -- and it might be a way to provide a clear incentive 
to participate. LOD could be a novel type of open-source project, one that 
is not only concerned with code, but also with the underlying data. The 
products of this open source project could then be used in various kinds of 
projects, some of them with commercial focus. In such a scenario, being the 
main stakeholder for a certain subset of LOD might become profitable, and 
give incentive to improve the data provided and controlled by each 
stakeholder. This business model could be similar to that of successful open 
source content management systems such as Typo3 or Drupal, where the code is 
free, but providing consulting and customization for certain commercial 
users is based on financial support.
I know that this idea of a 'LOD brand' counters the main motivation of most 
people in the community, but it might be the key to creating an incentive 
structure for providing linked data, improving data quality and actually 
getting people to use the data. With the current philosophy, I see the 
danger of LOD staying a permanent 'proof of concept'. The concept has been 
proved by now.

* A good point by Giovanni is that mere interlinking of datasets was 
possible since 1999 by re-using URIs, and that post-hoc mapping between 
datasets was possible since 2004, when owl:sameAs was invented. The linked 
data movement 'only' added the consensus that HTTP URIs should be used, and 
that a HTTP GET request should yield a small RDF subgraph, listing the RDF 
triples about the resource. Surely, this is a very practical thing for many 
reasons, but was it instrumental for the success of LOD? At the moment, it 
seems that most *useful* applications of LOD data are based on a central 
triple store created by the aggregation of some or all LOD data sources. In 
that case, one might ask whether the dereferenceable URIs are really an 
essential ingredient to the success or LOD, or just a 'good to have', but 
not essential, feature.


Giovanni:
> An alternative explanation i like is
> http://inamidst.com/whits/2008/technobunkum

This is the second time I see this link on this mailing list. He makes some 
very good points about the importance of focusing on providing solutions to 
problems, instead of becoming too tangled up in technicalities. I also read 
his other text on http://inamidst.com/whits/2008/ambient which gives a lot 
of insight into why he has abandoned Semantic Web technologies. I guess the 
problems he likes to see solved are too trivial to require a paradigmatic 
change (such as a  global trend towards RDF/OWL and linked data). However, I 
would not generalize this experience to yield the conclusion that the 
Semantic Web is a huge case of 'Technobunkum' (what a silly term, by the 
way). The fact that not every tiny little problem on the web might be in 
need of Semantic Web technologies does not mean that these technologies are 
worthless. There are plenty of real use cases in important business segments 
and companies where there is dire need for such new technologies -- life 
science and health care come to mind. I have the feeling that the whole web 
2.0 hype of the recent years has distorted the perception of web developers 
about what is actually of societal and economic importance. Creating yet 
another, slightly improved mashup between your Flickr photos, Google maps 
and Wikipedia might actually not be the most important problem of the world 
today. And it probably doesn't earn you money either. End of rant.


Cheers,
Matthias Samwald

DERI Galway, Ireland
http://deri.ie/

Konrad Lorenz Institute for Evolution & Cognition Research, Austria
http://kli.ac.at/
Received on Monday, 2 March 2009 11:04:02 UTC