Re: vocabs, metadata set, datasets from ZENG, MARCIA on 2011-01-07 (public-xg-lld@w3.org from January 2011)

From: ZENG, MARCIA <mzeng@kent.edu>
Date: Fri, 7 Jan 2011 10:16:05 -0500
To: Mark van Assem <mark@cs.vu.nl>
CC: Karen Coyle <kcoyle@kcoyle.net>, Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>, "public-xg-lld@w3.org" <public-xg-lld@w3.org>
Message-ID: <C94C9865.12F71%mzeng@kent.edu>
Mark,
Re: your question
        >>Re Marcia's point [["For example, in digital gazetteers not only the
place names are controlled but also the place features, type,
coordinates, and even maps are included."]]

       >>I'm not sure I get what you mean with the "also controlled",

I am giving the following text to explain further [ref]:

1.Concept of a geographic place is fuzzy (e.g., Rocky Mountains) and we use place names differently according to the circumstances (e.g., using "Santa Barbara" generally to mean the whole general area or specifically to mean just the incorporated city area.)
2. When locations are named, they can be in a gazetteer. A place can have more than one name: name variants, name in different languages, etc.
3. In a geospatially referenced gazetteer, each entry have a "footprint" consisting of latitude and longitude coordinates. This footprint can be a point (most current gazetteer footprints are points)...
4.Each entry in a digital gazetteer must also be categorized according to a formal typing system (a controlled vocabulary of type terminology).

#2 is what most thesauri would do, to control the synonyms and equivalents.
#3 is especially the approach used in a thesaurus to eliminate ambiguities.  But here they are not like a GPS which focuses on coordinates and use bounding boxes to provide a precise location.  These points in a gazetteer are more as a qualifier to provide context of a place.
#4 is to provide a TYPE for each named place. This is similar to the Medical Subject Headings where each concept is giving a TYPE code according to a formal typing system (see example [1]).  In the Getty Thesaurus of Geographical Names place types are also an important component in each entry.  Those TYPE values are usually are from from a controlled vocabulary.[2]  So they could use other building blocks.  However the general function and purpose of the digital gazetteer is, as a "spatial dictionary of named and typed places".

Quite a lot project have used ADL gazetteers as value vocabularies, but the gazetteers is also used as a reference itself, e.g., [3].
Marcia

[1] http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded
[2] http://www.alexandria.ucsb.edu/~lhill/FeatureTypes/ver070302/index.htm
[3] http://clients.alexandria.ucsb.edu/globetrotter/ (try to find a place then see the catalog record.)
[Ref] JCDL 2002 NKOS Workshop on Digital Gazetteers.  http://nkos.slis.kent.edu/DL02workshop.htm


On 1/7/11 5:05 AM, "Mark van Assem" <mark@cs.vu.nl> wrote:

Thanks all for the feedback!

I've tried to address all your points in de value vocab description:

- "A dataset is a collection of structured metadata records"

- added some more "similar terms", including KOS, gazetteer, authority
file, concept scheme

- "They are "building blocks" with which metadata records can be built."

Re Marcia's point [["For example, in digital gazetteers not only the
place names are controlled but also the place features, type,
coordinates, and even maps are included."]]

I'm not sure I get what you mean with the "also controlled", but I think
indeed that this is the same as the VIAF situation: the values in a
value vocabulary can be described with elements and values themselves,
which would make them "datasets" also. However, we can still see VIAF as
a value vocab and not a dataset, as its main role is to be a building
block for metadata records.

Mark


Op 6-1-2011 18:15, ZENG, MARCIA schreef:
> I like the way Karen used in terms of building block or not... Also
> agree with Jeff's use of SKOS 'concept scheme' to define VIAF.
>
>     * Regarding 'data sets': To me, the 'data sets' we are talking about
>       are structured data. Outside in other places 'data sets' could be
>       un-structured or semi-structured data (e.g., data.gov's raw data
>       sets).
>     * Regarding 'value vocabularies': In the conventional way we have
>       used "knowledge organization systems (KOS)" for concept schemes
>       (broader than "controlled vocabularies"). Most of the vocabulary
>       types are clear such as pick lists, taxonomies, thesauri, subject
>       headings. But there is a group of 'metadata-like' KOS such as
>       authority files and digital gazetteers. They are/can be
>       constructed as thesauri (e.g., The Getty Thesaurus of Geographic
>       Names (TGN) and Union List of Artist Names (ULAN)). Or, they can
>       be in other structures. It is the contents they include that made
>       them also be referred to 'data sets'. For example, in digital
>       gazetteers not only the place names are controlled but also the
>       place features, type, coordinates, and even maps are included.
>       Digital gazetteers can be used alone as data sets or be the value
>       vocabularies used in structured data sets. This might be like the
>       VIAF situation, depending on how it is constructed or on how it is
>       used.
>
> My 2 cents.
> Marcia
>
> On 1/6/11 11:37 AM, "Karen Coyle" <kcoyle@kcoyle.net> wrote:
>
>     Quoting Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>:
>
>
>     >  As for myself, I do have a few more comments :
>     >  - I think the emphasis on value vocabs is too important in the current
>     >  definition of dataset. It's actually creating confusion, in my view.
>     >  - I'm wondering if we could use the term "instance" (a dataset is a
>     >  collection of instance descriptions) or is it too implementation
>     oriented ?
>     >
>
>
>     I'm not sure that the term "instance" will work -- even a value in a
>     list could be considered an instance, no?
>
>     Somehow, the concept for a dataset is that it consists of the
>     descriptions of entities that you need for an application or function,
>     rather than the building blocks for creating such a description.
>     (Which gets back to Mark's statement about "A record for Derrida's
>     book in dataset X ...")
>
>     Essentially, one person's dataset could be another person's building
>     block. But I think the key is that a dataset is complete for an
>     application, while a value vocabulary needs to be combined with other
>     data to be useful.
>
>     No, I'm not satisfied with that explanation... I'll ruminate on this
>     and see if I can find better words.
>
>     kc
>
>     >  Emmanuelle
>     >
>     >  On Thu, Jan 6, 2011 at 5:13 PM, Mark van Assem <mark@cs.vu.nl> wrote:
>     >
>     > > Hi Emma,
>     > >
>     > > I saw you had already followed up on our action to clarify "value
>     > > vocabularies".
>     > >
>     > > I saw that you think we should clarify how value vocabularies
>     actually
>     > > appear in metadata records (as literals, codes, identifiers).
>     While I kinda
>     > > feel we should try to stay agnostic to that I kept it in, but
>     rewrote it
>     > > slightly:
>     > >
>     > > "In actual metadata records, the values used can be literals,
>     codes, or
>     > > identifiers (including URIs), as long as these refer to a
>     specific concept
>     > > in a value vocabulary. "
>     > >
>     > > I also moved your point re "closed list" up to the initial
>     definition; this
>     > > is indeed central to what a value vocab is.
>     > >
>     > > Mark.
>     > >
>     > >
>     > > On 06/01/2011 16:34, Mark van Assem wrote:
>     > >
>     > >> Hi Jodi,
>     > >>
>     > >> X and Y would be two collections ("datasets") from two different
>     > >> libraries. It could also be two subcollections or within one
>     collection,
>     > >> but I think making them separate ones will make it more
>     illustrative.
>     > >>
>     > >> Do you have a suggestion on how to clarify or replace X and Y with
>     > >> specific existing collections/libraries as examples?
>     > >>
>     > >> Mark
>     > >>
>     > >>
>     > >> On 06/01/2011 16:21, Jodi Schneider wrote:
>     > >>
>     > >>> Thanks for this, Mark! I especially like the 'confusions' area
>     -- that
>     > >>> will make this quite useful.
>     > >>>
>     > >>> In this, it would be helpful if you'd explain what datasets X and Y
>     > >>> might be. Particular collections? Subcollections of a larger whole?
>     > >>> "in some cases records in a dataset are themselves used as
>     values in
>     > >>> other datasets. For example, Derrida wrote a book that comments on
>     > >>> Heidegger's book "Sein und Zeit". A record for Derrida's book
>     in dataset
>     > >>> X can state this by relating it to a record for Heidegger's book in
>     > >>> dataset Y. This statement in the Derrida record could consist
>     of the
>     > >>> Dublin Core Subject with as value a reference to the Heidegger
>     record.
>     > >>> In this case we would still term X and Y datasets, not a value
>     > >>> vocabularies."
>     > >>>
>     > >>> -Jodi
>     > >>>
>     > >>> On 6 Jan 2011, at 08:00, Mark van Assem wrote:
>     > >>>
>     > >>>
>     > >>>> Hi all,
>     > >>>>
>     > >>>> As per my action I have written some text [1] to explain the terms
>     > >>>> "dataset, metadata element set, value vocabulary" with
>     feedback from
>     > >>>> Karen and Antoine to address the things that don't fit very
>     nicely.
>     > >>>>
>     > >>>> Please let me know what you think, after I've had your input
>     we'll put
>     > >>>> it on the public list to get shot at.
>     > >>>>
>     > >>>> Mark.
>     > >>>>
>     > >>>> [1]
>     > >>>>
>     http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Vocabularies.2C_Element_sets.2C_Datasets
>     > >>>>
>     > >>>>
>     > >>>> On 28/12/2010 18:40, Karen Coyle wrote:
>     > >>>>
>     > >>>>> I have been organizing the vocabularies and technologies on the
>     > >>>>> archives
>     > >>>>> cluster page [1] and it was a very interesting exercise trying to
>     > >>>>> determine what category some of the "things" fit into. This
>     could turn
>     > >>>>> out to be a starting place for our upcoming discussion of our
>     > >>>>> definitions since it has real examples. The hard part seems
>     to be value
>     > >>>>> vocabularies v. datasets, and I have a feeling that there
>     will not be a
>     > >>>>> clear line between them.
>     > >>>>>
>     > >>>>> kc
>     > >>>>> [1]
>     > >>>>>
>     > >>>>>
>     http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Archives#Vocabularies_and_Technologies
>     > >>>>>
>     > >>>>>
>     > >>>>>
>     > >>>>>
>     > >>>>
>     > >>>
>     > >>
>     >
>     >
>     >  --
>     >  =====
>     >  Emmanuelle Bermès - http://www.bnf.fr
>     >  Manue - http://www.figoblog.org
>     >
>
>
>
>     --
>     Karen Coyle
>     kcoyle@kcoyle.net http://kcoyle.net
>     ph: 1-510-540-7596
>     m: 1-510-435-8234
>     skype: kcoylenet
>
>
>
Received on Friday, 7 January 2011 15:16:54 UTC