Re: vocabs, metadata set, datasets from Mark van Assem on 2011-01-07 (public-xg-lld@w3.org from January 2011)

From: Mark van Assem <mark@cs.vu.nl>
Date: Fri, 07 Jan 2011 11:05:02 +0100
To: "ZENG, MARCIA" <mzeng@kent.edu>
CC: Karen Coyle <kcoyle@kcoyle.net>, Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>, "public-xg-lld@w3.org" <public-xg-lld@w3.org>
Message-ID: <4D26E54E.4080502@cs.vu.nl>
Thanks all for the feedback!

I've tried to address all your points in de value vocab description:

- "A dataset is a collection of structured metadata records"

- added some more "similar terms", including KOS, gazetteer, authority 
file, concept scheme

- "They are "building blocks" with which metadata records can be built."

Re Marcia's point [["For example, in digital gazetteers not only the 
place names are controlled but also the place features, type, 
coordinates, and even maps are included."]]

I'm not sure I get what you mean with the "also controlled", but I think 
indeed that this is the same as the VIAF situation: the values in a 
value vocabulary can be described with elements and values themselves, 
which would make them "datasets" also. However, we can still see VIAF as 
a value vocab and not a dataset, as its main role is to be a building 
block for metadata records.

Mark


Op 6-1-2011 18:15, ZENG, MARCIA schreef:
> I like the way Karen used in terms of building block or not... Also
> agree with Jeff’s use of SKOS ‘concept scheme’ to define VIAF.
>
>     * Regarding ‘data sets’: To me, the ‘data sets’ we are talking about
>       are structured data. Outside in other places ‘data sets’ could be
>       un-structured or semi-structured data (e.g., data.gov’s raw data
>       sets).
>     * Regarding ‘value vocabularies’: In the conventional way we have
>       used “knowledge organization systems (KOS)” for concept schemes
>       (broader than “controlled vocabularies”). Most of the vocabulary
>       types are clear such as pick lists, taxonomies, thesauri, subject
>       headings. But there is a group of ‘metadata-like’ KOS such as
>       authority files and digital gazetteers. They are/can be
>       constructed as thesauri (e.g., The Getty Thesaurus of Geographic
>       Names (TGN) and Union List of Artist Names (ULAN)). Or, they can
>       be in other structures. It is the contents they include that made
>       them also be referred to ‘data sets’. For example, in digital
>       gazetteers not only the place names are controlled but also the
>       place features, type, coordinates, and even maps are included.
>       Digital gazetteers can be used alone as data sets or be the value
>       vocabularies used in structured data sets. This might be like the
>       VIAF situation, depending on how it is constructed or on how it is
>       used.
>
> My 2 cents.
> Marcia
>
> On 1/6/11 11:37 AM, "Karen Coyle" <kcoyle@kcoyle.net> wrote:
>
>     Quoting Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>:
>
>
>     >  As for myself, I do have a few more comments :
>     >  - I think the emphasis on value vocabs is too important in the current
>     >  definition of dataset. It's actually creating confusion, in my view.
>     >  - I'm wondering if we could use the term "instance" (a dataset is a
>     >  collection of instance descriptions) or is it too implementation
>     oriented ?
>     >
>
>
>     I'm not sure that the term "instance" will work -- even a value in a
>     list could be considered an instance, no?
>
>     Somehow, the concept for a dataset is that it consists of the
>     descriptions of entities that you need for an application or function,
>     rather than the building blocks for creating such a description.
>     (Which gets back to Mark's statement about "A record for Derrida's
>     book in dataset X ...")
>
>     Essentially, one person's dataset could be another person's building
>     block. But I think the key is that a dataset is complete for an
>     application, while a value vocabulary needs to be combined with other
>     data to be useful.
>
>     No, I'm not satisfied with that explanation... I'll ruminate on this
>     and see if I can find better words.
>
>     kc
>
>     >  Emmanuelle
>     >
>     >  On Thu, Jan 6, 2011 at 5:13 PM, Mark van Assem <mark@cs.vu.nl> wrote:
>     >
>     > > Hi Emma,
>     > >
>     > > I saw you had already followed up on our action to clarify "value
>     > > vocabularies".
>     > >
>     > > I saw that you think we should clarify how value vocabularies
>     actually
>     > > appear in metadata records (as literals, codes, identifiers).
>     While I kinda
>     > > feel we should try to stay agnostic to that I kept it in, but
>     rewrote it
>     > > slightly:
>     > >
>     > > "In actual metadata records, the values used can be literals,
>     codes, or
>     > > identifiers (including URIs), as long as these refer to a
>     specific concept
>     > > in a value vocabulary. "
>     > >
>     > > I also moved your point re "closed list" up to the initial
>     definition; this
>     > > is indeed central to what a value vocab is.
>     > >
>     > > Mark.
>     > >
>     > >
>     > > On 06/01/2011 16:34, Mark van Assem wrote:
>     > >
>     > >> Hi Jodi,
>     > >>
>     > >> X and Y would be two collections ("datasets") from two different
>     > >> libraries. It could also be two subcollections or within one
>     collection,
>     > >> but I think making them separate ones will make it more
>     illustrative.
>     > >>
>     > >> Do you have a suggestion on how to clarify or replace X and Y with
>     > >> specific existing collections/libraries as examples?
>     > >>
>     > >> Mark
>     > >>
>     > >>
>     > >> On 06/01/2011 16:21, Jodi Schneider wrote:
>     > >>
>     > >>> Thanks for this, Mark! I especially like the 'confusions' area
>     -- that
>     > >>> will make this quite useful.
>     > >>>
>     > >>> In this, it would be helpful if you'd explain what datasets X and Y
>     > >>> might be. Particular collections? Subcollections of a larger whole?
>     > >>> "in some cases records in a dataset are themselves used as
>     values in
>     > >>> other datasets. For example, Derrida wrote a book that comments on
>     > >>> Heidegger's book "Sein und Zeit". A record for Derrida's book
>     in dataset
>     > >>> X can state this by relating it to a record for Heidegger's book in
>     > >>> dataset Y. This statement in the Derrida record could consist
>     of the
>     > >>> Dublin Core Subject with as value a reference to the Heidegger
>     record.
>     > >>> In this case we would still term X and Y datasets, not a value
>     > >>> vocabularies."
>     > >>>
>     > >>> -Jodi
>     > >>>
>     > >>> On 6 Jan 2011, at 08:00, Mark van Assem wrote:
>     > >>>
>     > >>>
>     > >>>> Hi all,
>     > >>>>
>     > >>>> As per my action I have written some text [1] to explain the terms
>     > >>>> "dataset, metadata element set, value vocabulary" with
>     feedback from
>     > >>>> Karen and Antoine to address the things that don't fit very
>     nicely.
>     > >>>>
>     > >>>> Please let me know what you think, after I've had your input
>     we'll put
>     > >>>> it on the public list to get shot at.
>     > >>>>
>     > >>>> Mark.
>     > >>>>
>     > >>>> [1]
>     > >>>>
>     http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Vocabularies.2C_Element_sets.2C_Datasets
>     > >>>>
>     > >>>>
>     > >>>> On 28/12/2010 18:40, Karen Coyle wrote:
>     > >>>>
>     > >>>>> I have been organizing the vocabularies and technologies on the
>     > >>>>> archives
>     > >>>>> cluster page [1] and it was a very interesting exercise trying to
>     > >>>>> determine what category some of the "things" fit into. This
>     could turn
>     > >>>>> out to be a starting place for our upcoming discussion of our
>     > >>>>> definitions since it has real examples. The hard part seems
>     to be value
>     > >>>>> vocabularies v. datasets, and I have a feeling that there
>     will not be a
>     > >>>>> clear line between them.
>     > >>>>>
>     > >>>>> kc
>     > >>>>> [1]
>     > >>>>>
>     > >>>>>
>     http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Archives#Vocabularies_and_Technologies
>     > >>>>>
>     > >>>>>
>     > >>>>>
>     > >>>>>
>     > >>>>
>     > >>>
>     > >>
>     >
>     >
>     >  --
>     >  =====
>     >  Emmanuelle Bermès - http://www.bnf.fr
>     >  Manue - http://www.figoblog.org
>     >
>
>
>
>     --
>     Karen Coyle
>     kcoyle@kcoyle.net http://kcoyle.net
>     ph: 1-510-540-7596
>     m: 1-510-435-8234
>     skype: kcoylenet
>
>
>
Received on Friday, 7 January 2011 10:06:03 UTC