Re: vocabs, metadata set, datasets from Mark van Assem on 2011-01-17 (public-xg-lld@w3.org from January 2011)

From: Mark van Assem <mark@cs.vu.nl>
Date: Mon, 17 Jan 2011 15:38:21 +0100
To: "ZENG, MARCIA" <mzeng@kent.edu>
CC: "public-xg-lld@w3.org" <public-xg-lld@w3.org>
Message-ID: <4D34545D.9040302@cs.vu.nl>
Hi Marcia,

>     >mz: This is 70% correct, just needs to take off ‘metadata schema’.
>     Every vocabulary (such as a thesaurus) has a schema that defines the
>     attributes, some are universal (the thesauri world all followed
>     ISO2788 for a long time, now ISO25964 and BS 8723 [ref][1]) and some
>     are locally defined.

I don't understand the argument. There definitely are cases that define 
their own metadata schema and do not reuse existing ones, so that would 
be a good reason to leave it in, even though others reuse existing ones.

So I'd be in favor of leaving it in, maybe with less emphasis?

>     >mz: [continue from above}: all value vocabularies have their own set
>     of attributes. TGN and other Getty vocabularies have their
>     standardized attributes and a number of controlled lists (e.g.,

<snip>

Hm... does this mean that you agree with my standpoint that MeSH and 
Getty are value vocabs, or do you disagree?

>     I've tried to cover this problem through the "Confusions" points. If
>     they do not succeed in doing this, what would you add/remove in the text
>     to fix this?
>
>>mz: I am providing the following suggestions for “Value Vocabularies”
> Confusions part.

I've used your suggestions to draft a new text, with slightly different 
emphasis, hope it is OK now!

Thanks!

Mark

> Before: a value vocabulary often also defines metadata elements. For
> example, GeoNames defines elements for coordinates, names and postal
> codes of places. These can be referred to as the GeoNames metadata
> elements. Similarly, VIAF defines elements to describe authorities
> (corporations, people).
>
> After: A value vocabulary often employs a schema that is derived from a
> model underlying its data structure. Some of the models are universal
> and have been defined in international and national standards, e.g., for
> thesauri [ref], while others are implementation-specific or yet to
> become widely-adopted. For example, GeoNames defines elements for
> coordinates, names and postal codes of places. Similarly, VIAF defines
> elements to describe corporate bodies and people.
>
> Confusion #2
> Before: We classify VIAF and GeoNames as value vocabularies instead of
> datasets because they are used (or are meant to be used) extensively as
> value vocabularies in record collections, while their metadata elements
> are not widely reused (as are DC elements). We acknowledge that this
> distinction is dependent on the role that the dataset/vocabulary plays
> instead of its inherent characteristics. Our viewpoint is indeed
> debatable, but sufficient for the purposes of our report
>
> After: We classify VIAF and GeoNames as value vocabularies instead of
> datasets because they are used (or are meant to be used) extensively as
> value vocabularies in building other record collections datasets. This
> distinction is dependent on the role that the dataset/vocabulary plays
> instead of its inherent characteristics.
>
> [ref]
> ISO 2788Documentation -- Guidelines for the establishment and
> development of monolingual thesauri. 1974, 1986.
> ISO 25964 Part 1 Thesauri and Interoperability with Other Vocabularies.
> Clause 15. Data model. 2010.
> BS 8723: Structured Vocabularies for Information Retrieval. Part 5.
> Exchange formats and protocols for interoperability. 2008
>
> [1] http://schemas.bs8723.org/Model.aspx
>
>
>     If I still didnt get your point I apologize!
>     Mark.
>
>     Hope this helps. Thanks.
>     Marcia
>
>     Op 7-1-2011 16:16, ZENG, MARCIA schreef:
>     >  Mark,
>     >  Re: your question
>     > > >Re Marcia's point [["For example, in digital gazetteers not only the
>     >
>     >  place names are controlled but also the place features, type,
>     >  coordinates, and even maps are included."]]
>     >
>     > > >I'm not sure I get what you mean with the "also controlled",
>     >
>     >  I am giving the following text to explain further [ref]:
>     >
>     >  1.Concept of a geographic place is fuzzy (e.g., Rocky Mountains)
>     and we
>     >  use place names differently according to the circumstances (e.g.,
>     using
>     >  “Santa Barbara” generally to mean the whole general area or
>     specifically
>     >  to mean just the incorporated city area.)
>     >  2.When locations are named, they can be in a gazetteer. A place
>     can have
>     >  more than one name: name variants, name in different languages, etc.
>     >  3.In a geospatially referenced gazetteer, each entry have a
>     “footprint”
>     >  consisting of latitude and longitude coordinates. This footprint
>     can be
>     >  a point (most current gazetteer footprints are points)...
>     >  4.Each entry in a digital gazetteer must also be categorized according
>     >  to a formal typing system (a controlled vocabulary of type
>     terminology).
>     >
>     >  #2 is what most thesauri would do, to control the synonyms and
>     equivalents.
>     >  #3 is especially the approach used in a thesaurus to eliminate
>     >  ambiguities. But here they are not like a GPS which focuses on
>     >  coordinates and use bounding boxes to provide a precise location.
>     These
>     >  points in a gazetteer are more as a qualifier to provide context of a
>     >  place.
>     >  #4 is to provide a TYPE for each named place. This is similar to the
>     >  Medical Subject Headings where each concept is giving a TYPE code
>     >  according to a formal typing system (see example [1]). In the Getty
>     >  Thesaurus of Geographical Names place types are also an important
>     >  component in each entry. Those TYPE values are usually are from from a
>     >  controlled vocabulary.[2] So they could use other building blocks.
>     >  However the general function and purpose of the digital gazetteer
>     is, as
>     >  a “spatial dictionary of named and typed places”.
>     >
>     >  Quite a lot project have used ADL gazetteers as value
>     vocabularies, but
>     >  the gazetteers is also used as a reference itself, e.g., [3].
>     >  Marcia
>     >
>     >  [1]
>     >
>     http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded
>     <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded>
>     >
>     <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded
>     <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded>>
>     >
>     >  [2]
>     http://www.alexandria.ucsb.edu/~lhill/FeatureTypes/ver070302/index.htm
>     >  [3] http://clients.alexandria.ucsb.edu/globetrotter/ (try to find a
>     >  place then see the catalog record.)
>     >  [Ref] JCDL 2002 NKOS Workshop on Digital Gazetteers.
>     >  http://nkos.slis.kent.edu/DL02workshop.htm
>     >
>     >
>     >  On 1/7/11 5:05 AM, "Mark van Assem" <mark@cs.vu.nl> wrote:
>     >
>     >  Thanks all for the feedback!
>     >
>     >  I've tried to address all your points in de value vocab description:
>     >
>     >  - "A dataset is a collection of structured metadata records"
>     >
>     >  - added some more "similar terms", including KOS, gazetteer, authority
>     >  file, concept scheme
>     >
>     >  - "They are "building blocks" with which metadata records can be
>     built."
>     >
>     >  Re Marcia's point [["For example, in digital gazetteers not only the
>     >  place names are controlled but also the place features, type,
>     >  coordinates, and even maps are included."]]
>     >
>     >  I'm not sure I get what you mean with the "also controlled", but I
>     think
>     >  indeed that this is the same as the VIAF situation: the values in a
>     >  value vocabulary can be described with elements and values themselves,
>     >  which would make them "datasets" also. However, we can still see
>     VIAF as
>     >  a value vocab and not a dataset, as its main role is to be a building
>     >  block for metadata records.
>     >
>     >  Mark
>     >
>     >
>     >  Op 6-1-2011 18:15, ZENG, MARCIA schreef:
>     >  > I like the way Karen used in terms of building block or not... Also
>     >  > agree with Jeff’s use of SKOS ‘concept scheme’ to define VIAF.
>     >  >
>     >  > * Regarding ‘data sets’: To me, the ‘data sets’ we are talking about
>     >  > are structured data. Outside in other places ‘data sets’ could be
>     >  > un-structured or semi-structured data (e.g., data.gov’s raw data
>     >  > sets).
>     >  > * Regarding ‘value vocabularies’: In the conventional way we have
>     >  > used “knowledge organization systems (KOS)” for concept schemes
>     >  > (broader than “controlled vocabularies”). Most of the vocabulary
>     >  > types are clear such as pick lists, taxonomies, thesauri, subject
>     >  > headings. But there is a group of ‘metadata-like’ KOS such as
>     >  > authority files and digital gazetteers. They are/can be
>     >  > constructed as thesauri (e.g., The Getty Thesaurus of Geographic
>     >  > Names (TGN) and Union List of Artist Names (ULAN)). Or, they can
>     >  > be in other structures. It is the contents they include that made
>     >  > them also be referred to ‘data sets’. For example, in digital
>     >  > gazetteers not only the place names are controlled but also the
>     >  > place features, type, coordinates, and even maps are included.
>     >  > Digital gazetteers can be used alone as data sets or be the value
>     >  > vocabularies used in structured data sets. This might be like the
>     >  > VIAF situation, depending on how it is constructed or on how it is
>     >  > used.
>     >  >
>     >  > My 2 cents.
>     >  > Marcia
>     >  >
>     >  > On 1/6/11 11:37 AM, "Karen Coyle" <kcoyle@kcoyle.net> wrote:
>     >  >
>     >  > Quoting Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>:
>     >  >
>     >  >
>     >  > > As for myself, I do have a few more comments :
>     >  > > - I think the emphasis on value vocabs is too important in the
>     >  current
>     >  > > definition of dataset. It's actually creating confusion, in my
>     view.
>     >  > > - I'm wondering if we could use the term "instance" (a dataset
>     is a
>     >  > > collection of instance descriptions) or is it too implementation
>     >  > oriented ?
>     >  > >
>     >  >
>     >  >
>     >  > I'm not sure that the term "instance" will work -- even a value in a
>     >  > list could be considered an instance, no?
>     >  >
>     >  > Somehow, the concept for a dataset is that it consists of the
>     >  > descriptions of entities that you need for an application or
>     function,
>     >  > rather than the building blocks for creating such a description.
>     >  > (Which gets back to Mark's statement about "A record for Derrida's
>     >  > book in dataset X ...")
>     >  >
>     >  > Essentially, one person's dataset could be another person's building
>     >  > block. But I think the key is that a dataset is complete for an
>     >  > application, while a value vocabulary needs to be combined with
>     other
>     >  > data to be useful.
>     >  >
>     >  > No, I'm not satisfied with that explanation... I'll ruminate on this
>     >  > and see if I can find better words.
>     >  >
>     >  > kc
>     >  >
>     >  > > Emmanuelle
>     >  > >
>     >  > > On Thu, Jan 6, 2011 at 5:13 PM, Mark van Assem <mark@cs.vu.nl>
>     >  wrote:
>     >  > >
>     >  > > > Hi Emma,
>     >  > > >
>     >  > > > I saw you had already followed up on our action to clarify
>     "value
>     >  > > > vocabularies".
>     >  > > >
>     >  > > > I saw that you think we should clarify how value vocabularies
>     >  > actually
>     >  > > > appear in metadata records (as literals, codes, identifiers).
>     >  > While I kinda
>     >  > > > feel we should try to stay agnostic to that I kept it in, but
>     >  > rewrote it
>     >  > > > slightly:
>     >  > > >
>     >  > > > "In actual metadata records, the values used can be literals,
>     >  > codes, or
>     >  > > > identifiers (including URIs), as long as these refer to a
>     >  > specific concept
>     >  > > > in a value vocabulary. "
>     >  > > >
>     >  > > > I also moved your point re "closed list" up to the initial
>     >  > definition; this
>     >  > > > is indeed central to what a value vocab is.
>     >  > > >
>     >  > > > Mark.
>     >  > > >
>     >  > > >
>     >  > > > On 06/01/2011 16:34, Mark van Assem wrote:
>     >  > > >
>     >  > > >> Hi Jodi,
>     >  > > >>
>     >  > > >> X and Y would be two collections ("datasets") from two
>     different
>     >  > > >> libraries. It could also be two subcollections or within one
>     >  > collection,
>     >  > > >> but I think making them separate ones will make it more
>     >  > illustrative.
>     >  > > >>
>     >  > > >> Do you have a suggestion on how to clarify or replace X and Y
>     >  with
>     >  > > >> specific existing collections/libraries as examples?
>     >  > > >>
>     >  > > >> Mark
>     >  > > >>
>     >  > > >>
>     >  > > >> On 06/01/2011 16:21, Jodi Schneider wrote:
>     >  > > >>
>     >  > > >>> Thanks for this, Mark! I especially like the 'confusions' area
>     >  > -- that
>     >  > > >>> will make this quite useful.
>     >  > > >>>
>     >  > > >>> In this, it would be helpful if you'd explain what datasets
>     >  X and Y
>     >  > > >>> might be. Particular collections? Subcollections of a larger
>     >  whole?
>     >  > > >>> "in some cases records in a dataset are themselves used as
>     >  > values in
>     >  > > >>> other datasets. For example, Derrida wrote a book that
>     >  comments on
>     >  > > >>> Heidegger's book "Sein und Zeit". A record for Derrida's book
>     >  > in dataset
>     >  > > >>> X can state this by relating it to a record for Heidegger's
>     >  book in
>     >  > > >>> dataset Y. This statement in the Derrida record could consist
>     >  > of the
>     >  > > >>> Dublin Core Subject with as value a reference to the Heidegger
>     >  > record.
>     >  > > >>> In this case we would still term X and Y datasets, not a value
>     >  > > >>> vocabularies."
>     >  > > >>>
>     >  > > >>> -Jodi
>     >  > > >>>
>     >  > > >>> On 6 Jan 2011, at 08:00, Mark van Assem wrote:
>     >  > > >>>
>     >  > > >>>
>     >  > > >>>> Hi all,
>     >  > > >>>>
>     >  > > >>>> As per my action I have written some text [1] to explain
>     >  the terms
>     >  > > >>>> "dataset, metadata element set, value vocabulary" with
>     >  > feedback from
>     >  > > >>>> Karen and Antoine to address the things that don't fit very
>     >  > nicely.
>     >  > > >>>>
>     >  > > >>>> Please let me know what you think, after I've had your input
>     >  > we'll put
>     >  > > >>>> it on the public list to get shot at.
>     >  > > >>>>
>     >  > > >>>> Mark.
>     >  > > >>>>
>     >  > > >>>> [1]
>     >  > > >>>>
>     >  >
>     >
>     http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Vocabularies.2C_Element_sets.2C_Datasets
>     >  > > >>>>
>     >  > > >>>>
>     >  > > >>>> On 28/12/2010 18:40, Karen Coyle wrote:
>     >  > > >>>>
>     >  > > >>>>> I have been organizing the vocabularies and technologies
>     >  on the
>     >  > > >>>>> archives
>     >  > > >>>>> cluster page [1] and it was a very interesting exercise
>     >  trying to
>     >  > > >>>>> determine what category some of the "things" fit into. This
>     >  > could turn
>     >  > > >>>>> out to be a starting place for our upcoming discussion
>     of our
>     >  > > >>>>> definitions since it has real examples. The hard part seems
>     >  > to be value
>     >  > > >>>>> vocabularies v. datasets, and I have a feeling that there
>     >  > will not be a
>     >  > > >>>>> clear line between them.
>     >  > > >>>>>
>     >  > > >>>>> kc
>     >  > > >>>>> [1]
>     >  > > >>>>>
>     >  > > >>>>>
>     >  >
>     >
>     http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Archives#Vocabularies_and_Technologies
>     >  > > >>>>>
>     >  > > >>>>>
>     >  > > >>>>>
>     >  > > >>>>>
>     >  > > >>>>
>     >  > > >>>
>     >  > > >>
>     >  > >
>     >  > >
>     >  > > --
>     >  > > =====
>     >  > > Emmanuelle Bermès - http://www.bnf.fr
>     >  > > Manue - http://www.figoblog.org
>     >  > >
>     >  >
>     >  >
>     >  >
>     >  > --
>     >  > Karen Coyle
>     >  > kcoyle@kcoyle.net http://kcoyle.net
>     >  > ph: 1-510-540-7596
>     >  > m: 1-510-435-8234
>     >  > skype: kcoylenet
>     >  >
>     >  >
>     >  >
>     >
>
Received on Monday, 17 January 2011 14:38:56 UTC