W3C home > Mailing lists > Public > public-xg-lld@w3.org > January 2011

RE: vocabs, metadata set, datasets

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Tue, 11 Jan 2011 10:47:11 -0800
Message-ID: <20110111104711.616405ls5uttak6n@kcoyle.net>
To: "Ford, Kevin" <kefo@loc.gov>
Cc: "public-xg-lld@w3.org" <public-xg-lld@w3.org>
Kevin, and others:

I also see this as a confusion between metadata set creation and  
metadata set usage. For example, if I am creating metadata instances,  
I probably view LCSH as a kind of thesaurus from which I select an  
individual subject heading. If I am creating a new LCSH entry, then I  
am creating a record with a number of data elements (preferred form,  
alternate form, cataloger's notes). If I am a developer of the LCSH  
metadata schema definitions in a machine-readable format (e.g. MARC  
Authorities or MADS) then I work with the data structure, not any  
actual instances of authority records. (All structure, no content.)

I often get the feeling that we aren't clear when we speak/write which  
of these we are talking about. The third one could perhaps be called  
"metadata modeling". The middle one... "vocabulary creation"? The  
first one "vocabulary usage"?

kc

Quoting "Ford, Kevin" <kefo@loc.gov>:

> Hi Mark,
>
> I wanted to spotlight your summation of one of Marcia's assertions:
>
> "If I understand correctly, your point is that some resources such  
> as gazetteers can be dataset, value vocab, and metadata schema..."
>
> Note: Marcia responded that "metadata schema" should be removed [1].  
>  ( I agree with that.)
>
> Karen touched on this also [2]: "Essentially, one person's dataset  
> could be another person's building block."
>
> I think these two observations are the source of the "confusion."
>
> And, like Karen and Marcia, I think it comes down to usage.  For  
> example, those who maintain or otherwise work directly with the  
> entire LCSH or VIAF or TGN or AAT very likely see a dataset.  Ditto  
> for those who might be interested in the data for analytical  
> purposes.  A cataloger, on the other hand, searching the AAT in  
> order to best describe a resource will perceive AAT as a list of  
> values from which she must select the most appropriate term.  She'll  
> see a value vocabulary.
>
> If this distinction centered on usage finds consensus, I can try to  
> write up a "confusion" blurb...
>
> Best,
> Kevin
>
>
> [1] http://lists.w3.org/Archives/Public/public-xg-lld/2011Jan/0045.html
> [2] http://lists.w3.org/Archives/Public/public-xg-lld/2011Jan/0027.html
>
>
> ________________________________________
> From: public-xg-lld-request@w3.org [public-xg-lld-request@w3.org] On  
> Behalf Of Mark van Assem [mark@cs.vu.nl]
> Sent: Saturday, January 08, 2011 05:22
> To: ZENG, MARCIA
> Cc: Karen Coyle; Emmanuelle Bermes; public-xg-lld@w3.org
> Subject: Re: vocabs, metadata set, datasets
>
> Hi Marcia,
>
> If I understand correctly, your point is that some resources such as
> gazetteers can be dataset, value vocab and metadata schema (because the
> gazetteer entries can have attributes themselves, and the values of
> these attributes may come from another code list defined in the gazetteer).
>
> I would definitely see TGN and MeSH as value vocabs, even though they
> specify their own metadata elements, and describe their own entries with
> elements and values (making them like a dataset) and may have "sub
> vocabularies".
>
> I've tried to cover this problem through the "Confusions" points. If
> they do not succeed in doing this, what would you add/remove in the text
> to fix this?
>
> If I still didnt get your point I apologize!
> Mark.
>
> Op 7-1-2011 16:16, ZENG, MARCIA schreef:
>> Mark,
>> Re: your question
>>> >Re Marcia's point [["For example, in digital gazetteers not only the
>>
>>     place names are controlled but also the place features, type,
>>     coordinates, and even maps are included."]]
>>
>>> >I'm not sure I get what you mean with the "also controlled",
>>
>> I am giving the following text to explain further [ref]:
>>
>> 1.Concept of a geographic place is fuzzy (e.g., Rocky Mountains) and we
>> use place names differently according to the circumstances (e.g., using
>> “Santa Barbara” generally to mean the whole general area or specifically
>> to mean just the incorporated city area.)
>> 2.When locations are named, they can be in a gazetteer. A place can have
>> more than one name: name variants, name in different languages, etc.
>> 3.In a geospatially referenced gazetteer, each entry have a “footprint”
>> consisting of latitude and longitude coordinates. This footprint can be
>> a point (most current gazetteer footprints are points)...
>> 4.Each entry in a digital gazetteer must also be categorized according
>> to a formal typing system (a controlled vocabulary of type terminology).
>>
>> #2 is what most thesauri would do, to control the synonyms and equivalents.
>> #3 is especially the approach used in a thesaurus to eliminate
>> ambiguities. But here they are not like a GPS which focuses on
>> coordinates and use bounding boxes to provide a precise location. These
>> points in a gazetteer are more as a qualifier to provide context of a
>> place.
>> #4 is to provide a TYPE for each named place. This is similar to the
>> Medical Subject Headings where each concept is giving a TYPE code
>> according to a formal typing system (see example [1]). In the Getty
>> Thesaurus of Geographical Names place types are also an important
>> component in each entry. Those TYPE values are usually are from from a
>> controlled vocabulary.[2] So they could use other building blocks.
>> However the general function and purpose of the digital gazetteer is, as
>> a “spatial dictionary of named and typed places”.
>>
>> Quite a lot project have used ADL gazetteers as value vocabularies, but
>> the gazetteers is also used as a reference itself, e.g., [3].
>> Marcia
>>
>> [1]
>> http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded
>> <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded>
>>
>> [2] http://www.alexandria.ucsb.edu/~lhill/FeatureTypes/ver070302/index.htm
>> [3] http://clients.alexandria.ucsb.edu/globetrotter/ (try to find a
>> place then see the catalog record.)
>> [Ref] JCDL 2002 NKOS Workshop on Digital Gazetteers.
>> http://nkos.slis.kent.edu/DL02workshop.htm
>>
>>
>> On 1/7/11 5:05 AM, "Mark van Assem" <mark@cs.vu.nl> wrote:
>>
>>     Thanks all for the feedback!
>>
>>     I've tried to address all your points in de value vocab description:
>>
>>     - "A dataset is a collection of structured metadata records"
>>
>>     - added some more "similar terms", including KOS, gazetteer, authority
>>     file, concept scheme
>>
>>     - "They are "building blocks" with which metadata records can be built."
>>
>>     Re Marcia's point [["For example, in digital gazetteers not only the
>>     place names are controlled but also the place features, type,
>>     coordinates, and even maps are included."]]
>>
>>     I'm not sure I get what you mean with the "also controlled", but I think
>>     indeed that this is the same as the VIAF situation: the values in a
>>     value vocabulary can be described with elements and values themselves,
>>     which would make them "datasets" also. However, we can still see VIAF as
>>     a value vocab and not a dataset, as its main role is to be a building
>>     block for metadata records.
>>
>>     Mark
>>
>>
>>     Op 6-1-2011 18:15, ZENG, MARCIA schreef:
>>     >  I like the way Karen used in terms of building block or not... Also
>>     >  agree with Jeff’s use of SKOS ‘concept scheme’ to define VIAF.
>>     >
>>     >  * Regarding ‘data sets’: To me, the ‘data sets’ we are talking about
>>     >  are structured data. Outside in other places ‘data sets’ could be
>>     >  un-structured or semi-structured data (e.g., data.gov’s raw data
>>     >  sets).
>>     >  * Regarding ‘value vocabularies’: In the conventional way we have
>>     >  used “knowledge organization systems (KOS)” for concept schemes
>>     >  (broader than “controlled vocabularies”). Most of the vocabulary
>>     >  types are clear such as pick lists, taxonomies, thesauri, subject
>>     >  headings. But there is a group of ‘metadata-like’ KOS such as
>>     >  authority files and digital gazetteers. They are/can be
>>     >  constructed as thesauri (e.g., The Getty Thesaurus of Geographic
>>     >  Names (TGN) and Union List of Artist Names (ULAN)). Or, they can
>>     >  be in other structures. It is the contents they include that made
>>     >  them also be referred to ‘data sets’. For example, in digital
>>     >  gazetteers not only the place names are controlled but also the
>>     >  place features, type, coordinates, and even maps are included.
>>     >  Digital gazetteers can be used alone as data sets or be the value
>>     >  vocabularies used in structured data sets. This might be like the
>>     >  VIAF situation, depending on how it is constructed or on how it is
>>     >  used.
>>     >
>>     >  My 2 cents.
>>     >  Marcia
>>     >
>>     >  On 1/6/11 11:37 AM, "Karen Coyle" <kcoyle@kcoyle.net> wrote:
>>     >
>>     >  Quoting Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>:
>>     >
>>     >
>>     >  > As for myself, I do have a few more comments :
>>     >  > - I think the emphasis on value vocabs is too important in the
>>     current
>>     >  > definition of dataset. It's actually creating confusion,  
>> in my view.
>>     >  > - I'm wondering if we could use the term "instance" (a dataset is a
>>     >  > collection of instance descriptions) or is it too implementation
>>     >  oriented ?
>>     >  >
>>     >
>>     >
>>     >  I'm not sure that the term "instance" will work -- even a value in a
>>     >  list could be considered an instance, no?
>>     >
>>     >  Somehow, the concept for a dataset is that it consists of the
>>     >  descriptions of entities that you need for an application or  
>> function,
>>     >  rather than the building blocks for creating such a description.
>>     >  (Which gets back to Mark's statement about "A record for Derrida's
>>     >  book in dataset X ...")
>>     >
>>     >  Essentially, one person's dataset could be another person's building
>>     >  block. But I think the key is that a dataset is complete for an
>>     >  application, while a value vocabulary needs to be combined with other
>>     >  data to be useful.
>>     >
>>     >  No, I'm not satisfied with that explanation... I'll ruminate on this
>>     >  and see if I can find better words.
>>     >
>>     >  kc
>>     >
>>     >  > Emmanuelle
>>     >  >
>>     >  > On Thu, Jan 6, 2011 at 5:13 PM, Mark van Assem <mark@cs.vu.nl>
>>     wrote:
>>     >  >
>>     >  > > Hi Emma,
>>     >  > >
>>     >  > > I saw you had already followed up on our action to clarify "value
>>     >  > > vocabularies".
>>     >  > >
>>     >  > > I saw that you think we should clarify how value vocabularies
>>     >  actually
>>     >  > > appear in metadata records (as literals, codes, identifiers).
>>     >  While I kinda
>>     >  > > feel we should try to stay agnostic to that I kept it in, but
>>     >  rewrote it
>>     >  > > slightly:
>>     >  > >
>>     >  > > "In actual metadata records, the values used can be literals,
>>     >  codes, or
>>     >  > > identifiers (including URIs), as long as these refer to a
>>     >  specific concept
>>     >  > > in a value vocabulary. "
>>     >  > >
>>     >  > > I also moved your point re "closed list" up to the initial
>>     >  definition; this
>>     >  > > is indeed central to what a value vocab is.
>>     >  > >
>>     >  > > Mark.
>>     >  > >
>>     >  > >
>>     >  > > On 06/01/2011 16:34, Mark van Assem wrote:
>>     >  > >
>>     >  > >> Hi Jodi,
>>     >  > >>
>>     >  > >> X and Y would be two collections ("datasets") from two different
>>     >  > >> libraries. It could also be two subcollections or within one
>>     >  collection,
>>     >  > >> but I think making them separate ones will make it more
>>     >  illustrative.
>>     >  > >>
>>     >  > >> Do you have a suggestion on how to clarify or replace X and Y
>>     with
>>     >  > >> specific existing collections/libraries as examples?
>>     >  > >>
>>     >  > >> Mark
>>     >  > >>
>>     >  > >>
>>     >  > >> On 06/01/2011 16:21, Jodi Schneider wrote:
>>     >  > >>
>>     >  > >>> Thanks for this, Mark! I especially like the 'confusions' area
>>     >  -- that
>>     >  > >>> will make this quite useful.
>>     >  > >>>
>>     >  > >>> In this, it would be helpful if you'd explain what datasets
>>     X and Y
>>     >  > >>> might be. Particular collections? Subcollections of a larger
>>     whole?
>>     >  > >>> "in some cases records in a dataset are themselves used as
>>     >  values in
>>     >  > >>> other datasets. For example, Derrida wrote a book that
>>     comments on
>>     >  > >>> Heidegger's book "Sein und Zeit". A record for Derrida's book
>>     >  in dataset
>>     >  > >>> X can state this by relating it to a record for Heidegger's
>>     book in
>>     >  > >>> dataset Y. This statement in the Derrida record could consist
>>     >  of the
>>     >  > >>> Dublin Core Subject with as value a reference to the Heidegger
>>     >  record.
>>     >  > >>> In this case we would still term X and Y datasets, not a value
>>     >  > >>> vocabularies."
>>     >  > >>>
>>     >  > >>> -Jodi
>>     >  > >>>
>>     >  > >>> On 6 Jan 2011, at 08:00, Mark van Assem wrote:
>>     >  > >>>
>>     >  > >>>
>>     >  > >>>> Hi all,
>>     >  > >>>>
>>     >  > >>>> As per my action I have written some text [1] to explain
>>     the terms
>>     >  > >>>> "dataset, metadata element set, value vocabulary" with
>>     >  feedback from
>>     >  > >>>> Karen and Antoine to address the things that don't fit very
>>     >  nicely.
>>     >  > >>>>
>>     >  > >>>> Please let me know what you think, after I've had your input
>>     >  we'll put
>>     >  > >>>> it on the public list to get shot at.
>>     >  > >>>>
>>     >  > >>>> Mark.
>>     >  > >>>>
>>     >  > >>>> [1]
>>     >  > >>>>
>>     >
>>      
>> http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Vocabularies.2C_Element_sets.2C_Datasets
>>     >  > >>>>
>>     >  > >>>>
>>     >  > >>>> On 28/12/2010 18:40, Karen Coyle wrote:
>>     >  > >>>>
>>     >  > >>>>> I have been organizing the vocabularies and technologies
>>     on the
>>     >  > >>>>> archives
>>     >  > >>>>> cluster page [1] and it was a very interesting exercise
>>     trying to
>>     >  > >>>>> determine what category some of the "things" fit into. This
>>     >  could turn
>>     >  > >>>>> out to be a starting place for our upcoming discussion of our
>>     >  > >>>>> definitions since it has real examples. The hard part seems
>>     >  to be value
>>     >  > >>>>> vocabularies v. datasets, and I have a feeling that there
>>     >  will not be a
>>     >  > >>>>> clear line between them.
>>     >  > >>>>>
>>     >  > >>>>> kc
>>     >  > >>>>> [1]
>>     >  > >>>>>
>>     >  > >>>>>
>>     >
>>      
>> http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Archives#Vocabularies_and_Technologies
>>     >  > >>>>>
>>     >  > >>>>>
>>     >  > >>>>>
>>     >  > >>>>>
>>     >  > >>>>
>>     >  > >>>
>>     >  > >>
>>     >  >
>>     >  >
>>     >  > --
>>     >  > =====
>>     >  > Emmanuelle Bermès - http://www.bnf.fr
>>     >  > Manue - http://www.figoblog.org
>>     >  >
>>     >
>>     >
>>     >
>>     >  --
>>     >  Karen Coyle
>>     >  kcoyle@kcoyle.net http://kcoyle.net
>>     >  ph: 1-510-540-7596
>>     >  m: 1-510-435-8234
>>     >  skype: kcoylenet
>>     >
>>     >
>>     >
>>
>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Tuesday, 11 January 2011 18:47:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 18:47:49 GMT