- From: Bernard Vatant <bernard.vatant@mondeca.com>
- Date: Tue, 21 Jun 2011 15:52:06 +0200
- To: public-lld@w3.org
- Cc: "emmanuelle.bermes" <emmanuelle.bermes@bnf.fr>
- Message-ID: <BANLkTi=di9ZuVgeNgGCw33QqShWzxj3-7Q@mail.gmail.com>
Hello all Emmanuelle has asked me to review the draft currently at http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset with a "fresh eye". Here are a few comments. Preliminary question : what is the main target of this document? to give linked data community the opportunity of understanding the specific viewpoint, resources and terminology used by the Library community? or to help Library people to enter the linked data universe? or both? Other? General structure of the document : The introduction defines element sets first, then value vocabularies and finally datasets. But the rest of the document presents examples the other way round, first datasets, then value vocabularies, then element sets. Why such an inversion, apart from the stylistic beauty of chiasmus? I keep being puzzled by the use of "element sets" and "value vocabularies" terminology. I must say that the first one in particular, for someone with background in maths, sounds like a very strange tautology (a set is made of elements by definition). Since this terminology has been discussed ad nauseam, I suppose it does make sense for the Library community. I always had the same feeling with Dublin Core "elements" anyway. As for "datasets" : in the general linked data world, a dataset is simply a consistent set of triples that you can query or download from a specific point. It's a technical, applicative definition, so it's orthogonal to the distinction between T-Box and A-Box (aka element sets and value vocabularies). Actually the distinction between metadata and data does not make much sense in the linked data universe. It's a continuum of information, and "it's triples all the way down". In particular as soon as CKAN is introduced the distinction between "value vocabularies" and "datasets" is blurred, since in CKAN packages there is no such distinction. Moreover, in the illustrative diagram, bubbles are either proper datasets (in the sense defined in the introduction) or value vocabularies. This does not help to clarify the distinction made in the introduction. To go down to an example, many people will find strange to find Geonames, DBpedia or Freebase defined as "value vocabularies". In fact in Geonames for example there is a "value vocabulary" of feature classes and codes, actually included technically along with the geonames "element set" in the so-called "geonames ontology" at http://www.geonames.org/ontology. The dataset of individual geonames "features" (geographical entities) is more an authority list like VIAF. So I would suggest to sort the list of "value vocabularies" into thesauri/classifications/subject headings on one side, and authority files on the other. And maybe make a distinction between resourcs developed in the library community framework, using state-of-the art methods of this community, from the crowd-sourced resources such as DBpedia, Freebase, or DBpedia. Best Bernard -- Bernard Vatant Senior Consultant Vocabulary & Data Integration Tel: +33 (0) 971 488 459 Mail: bernard.vatant@mondeca.com ---------------------------------------------------- Mondeca 3, cité Nollez 75018 Paris France Web: http://www.mondeca.com Blog: http://mondeca.wordpress.com ----------------------------------------------------
Received on Tuesday, 21 June 2011 13:52:34 UTC