RE: Review of LLD vocabularies and datasets from ZENG, MARCIA on 2011-06-25 (public-lld@w3.org from June 2011)

From: ZENG, MARCIA <mzeng@kent.edu>
Date: Sat, 25 Jun 2011 19:26:40 -0400
To: "public-lld@w3.org" <public-lld@w3.org>
Message-ID: <CA2BE568.62A2%mzeng@kent.edu>
Hi, Bernard and Monica,
and hi, Antoine, Jeff, and William,
(Just had a chance to read my emails, including your comments as I am traveling in China and have been isolated from the Internet for quite a while.)

Here are some quick comments:
1. First of all, thanks for all the detailed comments, corrections, and suggestions!
2. I felt that there are several issues around the naming of the deliverable, as well as the categorization of the items listed.  In general the contents included is not the main issue, am I right?
3. @Bernard: as for your suggestion of listing by 'type' of value vocabularies such as thesauri, classification, etc., do you think that listing value vocabularies under sub-categories of  'subject vocabularies' and 'entity vocabularies' (authority files for agent names, geo names, etc. ) will workable?  (I got the second term from someone else…)  Some vocabularies may not be said to belong to, e.g., a thesaurus type or a subject headings type or a controlled list.  'Classification',' taxonomy', 'categorization' may be difficult to differentiate…
4. Also, for the name of 'metadata element sets', should we supply other synonyms such as 'vocabularies for data structures/properties' -- or other terms?  I.E., line up the synonyms together?  For different communities, their familiar terminology might be different from others.
In the LOD-LAM Summit I found that the 'vocabularies' mean mostly RDF vocabularies (DCT, SKOS, FOAF…) by those who are active in DCMI community.  But for the museum and archives people, they kept thinking about controlled vocabularies.

These are my major concerns regarding the revision of this deliverable. I have not finished reading other emails.  Apologize if I missed something or was mistaken your comments.
Many thanks again.
Marcia

From: Bernard Vatant <bernard.vatant@mondeca.com<mailto:bernard.vatant@mondeca.com>>
Date: Tue, 21 Jun 2011 09:52:06 -0400
To: "public-lld@w3.org<mailto:public-lld@w3.org>" <public-lld@w3.org<mailto:public-lld@w3.org>>
Cc: Emmanuelle Bermes <emmanuelle.bermes@bnf.fr<mailto:emmanuelle.bermes@bnf.fr>>
Subject: Review of LLD vocabularies and datasets

Hello all

Emmanuelle has asked me to review the draft currently at http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset with a "fresh eye".
Here are a few comments.

Preliminary question : what is the main target of this document? to give linked data community the opportunity of understanding the specific viewpoint, resources and terminology used by the Library community? or to help Library people to enter the linked data universe? or both? Other?

General structure of the document : The introduction defines element sets first, then value vocabularies and finally datasets. But the rest of the document presents examples the other way round, first datasets, then value vocabularies, then element sets. Why such an inversion, apart from the stylistic beauty of chiasmus?

I keep being puzzled by the use of "element sets" and "value vocabularies" terminology. I must say that the first one in particular, for someone with background in maths, sounds like a very strange tautology (a set is made of elements by definition). Since this terminology has been discussed ad nauseam, I suppose it does make sense for the Library community. I always had the same feeling with Dublin Core "elements" anyway.

As for "datasets" : in the general linked data world, a dataset is simply a consistent set of triples that you can query or download from a specific point. It's a technical, applicative definition, so it's orthogonal to the distinction between T-Box and A-Box (aka element sets and value vocabularies). Actually the distinction between metadata and data does not make much sense in the linked data universe. It's a continuum of information, and "it's triples all the way down".

In particular as soon as CKAN is introduced the distinction between "value vocabularies" and "datasets" is blurred, since in CKAN packages there is no such distinction. Moreover, in the illustrative diagram, bubbles are either proper datasets (in the sense defined in the introduction) or value vocabularies. This does not help to clarify the distinction made in the introduction.

To go down to an example, many people will find strange to find Geonames, DBpedia or Freebase defined as "value vocabularies". In fact in Geonames for example there is a "value vocabulary" of feature classes and codes, actually included technically along with the geonames "element set" in the so-called "geonames ontology" at http://www.geonames.org/ontology.
The dataset of individual geonames "features" (geographical entities) is more an authority list like VIAF.

So I would suggest to sort the list of "value vocabularies" into thesauri/classifications/subject headings on one side, and authority files on the other. And maybe make a distinction between resourcs developed in the library community framework, using state-of-the art methods of this community, from the crowd-sourced resources such as DBpedia, Freebase, or DBpedia.

Best

Bernard


--
Bernard Vatant
Senior Consultant
Vocabulary & Data Integration
Tel:       +33 (0) 971 488 459
Mail:     bernard.vatant@mondeca.com<mailto:bernard.vatant@mondeca.com>
----------------------------------------------------
Mondeca
3, cité Nollez 75018 Paris France
Web:    http://www.mondeca.com
Blog:    http://mondeca.wordpress.com
----------------------------------------------------
Received on Saturday, 25 June 2011 23:28:36 UTC