Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included. from Dan Brickley on 2010-09-04 (public-lod@w3.org from September 2010)

From: Dan Brickley <danbri@danbri.org>
Date: Sat, 4 Sep 2010 11:32:59 +0200
To: Anja Jentzsch <anja@anjeve.de>
Cc: "public-lod@w3.org" <public-lod@w3.org>, Thomas Baker <tbaker@tbaker.de>
Message-ID: <AANLkTimRnHp8x3H37Mk=Ha8j+DL9KNXbeAuKyoK++1OJ@mail.gmail.com>

On Thu, Sep 2, 2010 at 8:10 PM, Anja Jentzsch <anja@anjeve.de> wrote:
> Hi all,
>
> we are in the process of drawing the next version of the LOD cloud diagram.
> This time it is likely to contain around 180 datasets altogether having a
> size of around 20 billion RDF triples.
>
> For drawing the next version of the LOD cloud, we have started to collect
> meta-information about the datasets to be included on CKAN, a registry of
> open data and content packages provided by the Open Knowledge Foundation.
>
> The list of datasets about which we have already collected information is be
> found here:
>
> http://www4.wiwiss.fu-berlin.de/lodcloud/
>
> In addition to basic meta-information about a dataset such as its size and
> the number of links pointing at other datasets, we also collect additional
> meta-information about the license of the dataset, alternative access
> options like SPARQL endpoints or dataset dumps, and whether there exist a
> voiD description of the dataset or a Semantic Web Sitemap.
>
> So if your dataset is not listed yet and you want to have it included into
> the next version of the LOD cloud, please add it to CKAN until next
> Wednesday (September 8th, 2010).
>
> Also, if we have collected wrong information about your dataset or if your
> dataset is only partially described up till now, it would be great if you
> could add the missing information.
>
> Guidelines about how to add datasets to CKAN as well as about the tags that
> we are using to annotate the datasets are found here:
> http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
>
> We thank all contributors in advance for their input and help, which
> hopefully will allow us to draw the next version of the LOD cloud as
> accurate as possible.

This is great! Glad to see this being updated :)

One thing I would love in the next revision is for FOAF to also be
presented as a vocabulary, rather than as if it were itself a distinct
dataset. While there are databases that expose as FOAF (LiveJournal
etc.), and also a reasonable number of independently published 'FOAF
files', the technical core of FOAF is really the vocabulary and the
habit of linking things together. Having a FOAF 'blob' is great and
all, but it doesn't help people understand that FOAF is used as a
vocabulary by various of the other blobs too. And beyond FOAF, I'm
wondering how we can visually represent the use of eg. Music Ontology,
or Dublin Core, or Creative Commons vocabularies across different
regions of the cloud. Maybe (later :) someone could make a view where
each blob is a pie-chart showing which vocabularies it uses?

As a vocabulary manager, it is pretty hard to understand the costs and
benefits of possible changes to a widely deployed RDF vocabulary. I'm
sure I'm not alone in this; Tom (cc:'d) I expect would vouch the same
regarding the Dublin Core terms. So if there could be some view of the
new cloud diagram that showed us which blobs (er, datasets) used which
vocabulary (and which terms), that would be really wonderful. On the
Dublin Core side, it would be fascinating to see which datasets are
using http://purl.org/dc/elements/1.1/ and which are using
http://purl.org/dc/terms/ (and which are using both). Similarly with
FOAF, I'd like to understand common deployment patterns better.  I
expect other vocab managers and dataset publishersare in a similar
situation, and would appreciate a map of the wider territory, so they
know how to fit in with trends and conventions, or what missing pieces
of vocabulary might need more work...

Thanks for any thoughts,

Dan

Received on Saturday, 4 September 2010 09:34:38 UTC