W3C home > Mailing lists > Public > public-xg-lld@w3.org > January 2011

Re: vocabs, metadata set, datasets

From: ZENG, MARCIA <mzeng@kent.edu>
Date: Thu, 6 Jan 2011 12:15:12 -0500
To: Karen Coyle <kcoyle@kcoyle.net>, Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>
CC: Mark van Assem <mark@cs.vu.nl>, "public-xg-lld@w3.org" <public-xg-lld@w3.org>
Message-ID: <C94B62D0.12F09%mzeng@kent.edu>
I like the way Karen used in terms of building block or not... Also agree with Jeff's use of SKOS 'concept scheme' to define VIAF.

 *   Regarding 'data sets':  To me, the 'data sets' we are talking about are structured data.  Outside in other places 'data sets' could be un-structured or semi-structured data (e.g., data.gov's raw data sets).
 *   Regarding 'value vocabularies':  In the conventional way we have used "knowledge organization systems (KOS)" for concept schemes (broader than "controlled vocabularies").  Most of the vocabulary types are clear such as pick lists, taxonomies, thesauri, subject headings.  But there is a group of 'metadata-like' KOS such as authority files and digital gazetteers.   They are/can be constructed as thesauri (e.g., The Getty Thesaurus of Geographic Names (TGN) and Union List of Artist Names (ULAN)).  Or, they can be in other structures.  It is the contents they include that made them also be referred to 'data sets'.   For example, in digital gazetteers not only the place names are controlled but also the place features, type, coordinates, and even maps are included.  Digital gazetteers can be used alone as data sets or be the value vocabularies used in structured data sets.  This might be like the VIAF situation, depending on how it is constructed or on how it is used.

My 2 cents.
Marcia

On 1/6/11 11:37 AM, "Karen Coyle" <kcoyle@kcoyle.net> wrote:

Quoting Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>:


>  As for myself, I do have a few more comments :
> - I think the emphasis on value vocabs is too important in the current
> definition of dataset. It's actually creating confusion, in my view.
> - I'm wondering if we could use the term "instance" (a dataset is a
> collection of instance descriptions) or is it too implementation oriented ?
>


I'm not sure that the term "instance" will work -- even a value in a
list could be considered an instance, no?

Somehow, the concept for a dataset is that it consists of the
descriptions of entities that you need for an application or function,
rather than the building blocks for creating such a description.
(Which gets back to Mark's statement about "A record for Derrida's
book in dataset X ...")

Essentially, one person's dataset could be another person's building
block. But I think the key is that a dataset is complete for an
application, while a value vocabulary needs to be combined with other
data to be useful.

No, I'm not satisfied with that explanation... I'll ruminate on this
and see if I can find better words.

kc

> Emmanuelle
>
> On Thu, Jan 6, 2011 at 5:13 PM, Mark van Assem <mark@cs.vu.nl> wrote:
>
>> Hi Emma,
>>
>> I saw you had already followed up on our action to clarify "value
>> vocabularies".
>>
>> I saw that you think we should clarify how value vocabularies actually
>> appear in metadata records (as literals, codes, identifiers). While I kinda
>> feel we should try to stay agnostic to that I kept it in, but rewrote it
>> slightly:
>>
>> "In actual metadata records, the values used can be literals, codes, or
>> identifiers (including URIs), as long as these refer to a specific concept
>> in a value vocabulary. "
>>
>> I also moved your point re "closed list" up to the initial definition; this
>> is indeed central to what a value vocab is.
>>
>> Mark.
>>
>>
>> On 06/01/2011 16:34, Mark van Assem wrote:
>>
>>> Hi Jodi,
>>>
>>> X and Y would be two collections ("datasets") from two different
>>> libraries. It could also be two subcollections or within one collection,
>>> but I think making them separate ones will make it more illustrative.
>>>
>>> Do you have a suggestion on how to clarify or replace X and Y with
>>> specific existing collections/libraries as examples?
>>>
>>> Mark
>>>
>>>
>>> On 06/01/2011 16:21, Jodi Schneider wrote:
>>>
>>>> Thanks for this, Mark! I especially like the 'confusions' area -- that
>>>> will make this quite useful.
>>>>
>>>> In this, it would be helpful if you'd explain what datasets X and Y
>>>> might be. Particular collections? Subcollections of a larger whole?
>>>> "in some cases records in a dataset are themselves used as values in
>>>> other datasets. For example, Derrida wrote a book that comments on
>>>> Heidegger's book "Sein und Zeit". A record for Derrida's book in dataset
>>>> X can state this by relating it to a record for Heidegger's book in
>>>> dataset Y. This statement in the Derrida record could consist of the
>>>> Dublin Core Subject with as value a reference to the Heidegger record.
>>>> In this case we would still term X and Y datasets, not a value
>>>> vocabularies."
>>>>
>>>> -Jodi
>>>>
>>>> On 6 Jan 2011, at 08:00, Mark van Assem wrote:
>>>>
>>>>
>>>>> Hi all,
>>>>>
>>>>> As per my action I have written some text [1] to explain the terms
>>>>> "dataset, metadata element set, value vocabulary" with feedback from
>>>>> Karen and Antoine to address the things that don't fit very nicely.
>>>>>
>>>>> Please let me know what you think, after I've had your input we'll put
>>>>> it on the public list to get shot at.
>>>>>
>>>>> Mark.
>>>>>
>>>>> [1]
>>>>> http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Vocabularies.2C_Element_sets.2C_Datasets
>>>>>
>>>>>
>>>>> On 28/12/2010 18:40, Karen Coyle wrote:
>>>>>
>>>>>> I have been organizing the vocabularies and technologies on the
>>>>>> archives
>>>>>> cluster page [1] and it was a very interesting exercise trying to
>>>>>> determine what category some of the "things" fit into. This could turn
>>>>>> out to be a starting place for our upcoming discussion of our
>>>>>> definitions since it has real examples. The hard part seems to be value
>>>>>> vocabularies v. datasets, and I have a feeling that there will not be a
>>>>>> clear line between them.
>>>>>>
>>>>>> kc
>>>>>> [1]
>>>>>>
>>>>>> http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Archives#Vocabularies_and_Technologies
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>
> --
> =====
> Emmanuelle Bermès - http://www.bnf.fr
> Manue - http://www.figoblog.org
>



--
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Thursday, 6 January 2011 17:17:57 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 6 January 2011 17:17:57 GMT