Types of Data Source on the LOD Cloud from Leigh Dodds on 2010-10-22 (public-lod@w3.org from October 2010)

From: Leigh Dodds <leigh.dodds@talis.com>
Date: Fri, 22 Oct 2010 16:15:55 +0100
To: Linking Open Data <public-lod@w3.org>
Message-ID: <AANLkTik50TnsVEMioUWnqk3qmbRv9wPASZZ9pnyV2M8t@mail.gmail.com>

Hi,

The LOD cloud analysis [1] is a really great piece of work. I wanted
to pick up on one aspect of the analysis for further discussion:
whether data is published by the data owner or a third-party.

It seems to me that there are broadly three categories into which a
dataset might fall:

* Primary -- published and maintained directly by the data owner, e.g. BBC
* Secondary -- published and maintained by a third-party, e.g. by
scraping, wrapping or otherwise converting a data source
* Tertiary -- published and maintained by a third-party, usually a
mirror or aggregation of primary/secondary sources. This might be a
direct mirror, or involve some additional creativity, e.g.
re-modelling some aspects of another dataset. Mirrors typically
provide additional services, e.g. a SPARQL endpoint where primary
source doesn't provide one.

If we consider the different categories we can see that:

* Growth of the web of data is best served by encouraging more Primary
sources. The current community can't scale to add more Secondary
sources, so adoption is best driven by data owners

* Sustainability and usage of Linked Data is best served by
encouraging more Tertiary sources. Availability of useful, current
aggregations of data, wrapped in services will help drive more
consumption.

What do others think?

Cheers,

L.

[1]. http://www4.wiwiss.fu-berlin.de/lodcloud/state/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.dodds@talis.com
http://www.talis.com

Received on Friday, 22 October 2010 15:16:34 UTC