- From: Mark van Assem <mark@cs.vu.nl>
- Date: Mon, 17 Jan 2011 15:38:21 +0100
- To: "ZENG, MARCIA" <mzeng@kent.edu>
- CC: "public-xg-lld@w3.org" <public-xg-lld@w3.org>
Hi Marcia,
> >mz: This is 70% correct, just needs to take off ‘metadata schema’.
> Every vocabulary (such as a thesaurus) has a schema that defines the
> attributes, some are universal (the thesauri world all followed
> ISO2788 for a long time, now ISO25964 and BS 8723 [ref][1]) and some
> are locally defined.
I don't understand the argument. There definitely are cases that define
their own metadata schema and do not reuse existing ones, so that would
be a good reason to leave it in, even though others reuse existing ones.
So I'd be in favor of leaving it in, maybe with less emphasis?
> >mz: [continue from above}: all value vocabularies have their own set
> of attributes. TGN and other Getty vocabularies have their
> standardized attributes and a number of controlled lists (e.g.,
<snip>
Hm... does this mean that you agree with my standpoint that MeSH and
Getty are value vocabs, or do you disagree?
> I've tried to cover this problem through the "Confusions" points. If
> they do not succeed in doing this, what would you add/remove in the text
> to fix this?
>
>>mz: I am providing the following suggestions for “Value Vocabularies”
> Confusions part.
I've used your suggestions to draft a new text, with slightly different
emphasis, hope it is OK now!
Thanks!
Mark
> Before: a value vocabulary often also defines metadata elements. For
> example, GeoNames defines elements for coordinates, names and postal
> codes of places. These can be referred to as the GeoNames metadata
> elements. Similarly, VIAF defines elements to describe authorities
> (corporations, people).
>
> After: A value vocabulary often employs a schema that is derived from a
> model underlying its data structure. Some of the models are universal
> and have been defined in international and national standards, e.g., for
> thesauri [ref], while others are implementation-specific or yet to
> become widely-adopted. For example, GeoNames defines elements for
> coordinates, names and postal codes of places. Similarly, VIAF defines
> elements to describe corporate bodies and people.
>
> Confusion #2
> Before: We classify VIAF and GeoNames as value vocabularies instead of
> datasets because they are used (or are meant to be used) extensively as
> value vocabularies in record collections, while their metadata elements
> are not widely reused (as are DC elements). We acknowledge that this
> distinction is dependent on the role that the dataset/vocabulary plays
> instead of its inherent characteristics. Our viewpoint is indeed
> debatable, but sufficient for the purposes of our report
>
> After: We classify VIAF and GeoNames as value vocabularies instead of
> datasets because they are used (or are meant to be used) extensively as
> value vocabularies in building other record collections datasets. This
> distinction is dependent on the role that the dataset/vocabulary plays
> instead of its inherent characteristics.
>
> [ref]
> ISO 2788Documentation -- Guidelines for the establishment and
> development of monolingual thesauri. 1974, 1986.
> ISO 25964 Part 1 Thesauri and Interoperability with Other Vocabularies.
> Clause 15. Data model. 2010.
> BS 8723: Structured Vocabularies for Information Retrieval. Part 5.
> Exchange formats and protocols for interoperability. 2008
>
> [1] http://schemas.bs8723.org/Model.aspx
>
>
> If I still didnt get your point I apologize!
> Mark.
>
> Hope this helps. Thanks.
> Marcia
>
> Op 7-1-2011 16:16, ZENG, MARCIA schreef:
> > Mark,
> > Re: your question
> > > >Re Marcia's point [["For example, in digital gazetteers not only the
> >
> > place names are controlled but also the place features, type,
> > coordinates, and even maps are included."]]
> >
> > > >I'm not sure I get what you mean with the "also controlled",
> >
> > I am giving the following text to explain further [ref]:
> >
> > 1.Concept of a geographic place is fuzzy (e.g., Rocky Mountains)
> and we
> > use place names differently according to the circumstances (e.g.,
> using
> > “Santa Barbara” generally to mean the whole general area or
> specifically
> > to mean just the incorporated city area.)
> > 2.When locations are named, they can be in a gazetteer. A place
> can have
> > more than one name: name variants, name in different languages, etc.
> > 3.In a geospatially referenced gazetteer, each entry have a
> “footprint”
> > consisting of latitude and longitude coordinates. This footprint
> can be
> > a point (most current gazetteer footprints are points)...
> > 4.Each entry in a digital gazetteer must also be categorized according
> > to a formal typing system (a controlled vocabulary of type
> terminology).
> >
> > #2 is what most thesauri would do, to control the synonyms and
> equivalents.
> > #3 is especially the approach used in a thesaurus to eliminate
> > ambiguities. But here they are not like a GPS which focuses on
> > coordinates and use bounding boxes to provide a precise location.
> These
> > points in a gazetteer are more as a qualifier to provide context of a
> > place.
> > #4 is to provide a TYPE for each named place. This is similar to the
> > Medical Subject Headings where each concept is giving a TYPE code
> > according to a formal typing system (see example [1]). In the Getty
> > Thesaurus of Geographical Names place types are also an important
> > component in each entry. Those TYPE values are usually are from from a
> > controlled vocabulary.[2] So they could use other building blocks.
> > However the general function and purpose of the digital gazetteer
> is, as
> > a “spatial dictionary of named and typed places”.
> >
> > Quite a lot project have used ADL gazetteers as value
> vocabularies, but
> > the gazetteers is also used as a reference itself, e.g., [3].
> > Marcia
> >
> > [1]
> >
> http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded
> <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded>
> >
> <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded
> <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded>>
> >
> > [2]
> http://www.alexandria.ucsb.edu/~lhill/FeatureTypes/ver070302/index.htm
> > [3] http://clients.alexandria.ucsb.edu/globetrotter/ (try to find a
> > place then see the catalog record.)
> > [Ref] JCDL 2002 NKOS Workshop on Digital Gazetteers.
> > http://nkos.slis.kent.edu/DL02workshop.htm
> >
> >
> > On 1/7/11 5:05 AM, "Mark van Assem" <mark@cs.vu.nl> wrote:
> >
> > Thanks all for the feedback!
> >
> > I've tried to address all your points in de value vocab description:
> >
> > - "A dataset is a collection of structured metadata records"
> >
> > - added some more "similar terms", including KOS, gazetteer, authority
> > file, concept scheme
> >
> > - "They are "building blocks" with which metadata records can be
> built."
> >
> > Re Marcia's point [["For example, in digital gazetteers not only the
> > place names are controlled but also the place features, type,
> > coordinates, and even maps are included."]]
> >
> > I'm not sure I get what you mean with the "also controlled", but I
> think
> > indeed that this is the same as the VIAF situation: the values in a
> > value vocabulary can be described with elements and values themselves,
> > which would make them "datasets" also. However, we can still see
> VIAF as
> > a value vocab and not a dataset, as its main role is to be a building
> > block for metadata records.
> >
> > Mark
> >
> >
> > Op 6-1-2011 18:15, ZENG, MARCIA schreef:
> > > I like the way Karen used in terms of building block or not... Also
> > > agree with Jeff’s use of SKOS ‘concept scheme’ to define VIAF.
> > >
> > > * Regarding ‘data sets’: To me, the ‘data sets’ we are talking about
> > > are structured data. Outside in other places ‘data sets’ could be
> > > un-structured or semi-structured data (e.g., data.gov’s raw data
> > > sets).
> > > * Regarding ‘value vocabularies’: In the conventional way we have
> > > used “knowledge organization systems (KOS)” for concept schemes
> > > (broader than “controlled vocabularies”). Most of the vocabulary
> > > types are clear such as pick lists, taxonomies, thesauri, subject
> > > headings. But there is a group of ‘metadata-like’ KOS such as
> > > authority files and digital gazetteers. They are/can be
> > > constructed as thesauri (e.g., The Getty Thesaurus of Geographic
> > > Names (TGN) and Union List of Artist Names (ULAN)). Or, they can
> > > be in other structures. It is the contents they include that made
> > > them also be referred to ‘data sets’. For example, in digital
> > > gazetteers not only the place names are controlled but also the
> > > place features, type, coordinates, and even maps are included.
> > > Digital gazetteers can be used alone as data sets or be the value
> > > vocabularies used in structured data sets. This might be like the
> > > VIAF situation, depending on how it is constructed or on how it is
> > > used.
> > >
> > > My 2 cents.
> > > Marcia
> > >
> > > On 1/6/11 11:37 AM, "Karen Coyle" <kcoyle@kcoyle.net> wrote:
> > >
> > > Quoting Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>:
> > >
> > >
> > > > As for myself, I do have a few more comments :
> > > > - I think the emphasis on value vocabs is too important in the
> > current
> > > > definition of dataset. It's actually creating confusion, in my
> view.
> > > > - I'm wondering if we could use the term "instance" (a dataset
> is a
> > > > collection of instance descriptions) or is it too implementation
> > > oriented ?
> > > >
> > >
> > >
> > > I'm not sure that the term "instance" will work -- even a value in a
> > > list could be considered an instance, no?
> > >
> > > Somehow, the concept for a dataset is that it consists of the
> > > descriptions of entities that you need for an application or
> function,
> > > rather than the building blocks for creating such a description.
> > > (Which gets back to Mark's statement about "A record for Derrida's
> > > book in dataset X ...")
> > >
> > > Essentially, one person's dataset could be another person's building
> > > block. But I think the key is that a dataset is complete for an
> > > application, while a value vocabulary needs to be combined with
> other
> > > data to be useful.
> > >
> > > No, I'm not satisfied with that explanation... I'll ruminate on this
> > > and see if I can find better words.
> > >
> > > kc
> > >
> > > > Emmanuelle
> > > >
> > > > On Thu, Jan 6, 2011 at 5:13 PM, Mark van Assem <mark@cs.vu.nl>
> > wrote:
> > > >
> > > > > Hi Emma,
> > > > >
> > > > > I saw you had already followed up on our action to clarify
> "value
> > > > > vocabularies".
> > > > >
> > > > > I saw that you think we should clarify how value vocabularies
> > > actually
> > > > > appear in metadata records (as literals, codes, identifiers).
> > > While I kinda
> > > > > feel we should try to stay agnostic to that I kept it in, but
> > > rewrote it
> > > > > slightly:
> > > > >
> > > > > "In actual metadata records, the values used can be literals,
> > > codes, or
> > > > > identifiers (including URIs), as long as these refer to a
> > > specific concept
> > > > > in a value vocabulary. "
> > > > >
> > > > > I also moved your point re "closed list" up to the initial
> > > definition; this
> > > > > is indeed central to what a value vocab is.
> > > > >
> > > > > Mark.
> > > > >
> > > > >
> > > > > On 06/01/2011 16:34, Mark van Assem wrote:
> > > > >
> > > > >> Hi Jodi,
> > > > >>
> > > > >> X and Y would be two collections ("datasets") from two
> different
> > > > >> libraries. It could also be two subcollections or within one
> > > collection,
> > > > >> but I think making them separate ones will make it more
> > > illustrative.
> > > > >>
> > > > >> Do you have a suggestion on how to clarify or replace X and Y
> > with
> > > > >> specific existing collections/libraries as examples?
> > > > >>
> > > > >> Mark
> > > > >>
> > > > >>
> > > > >> On 06/01/2011 16:21, Jodi Schneider wrote:
> > > > >>
> > > > >>> Thanks for this, Mark! I especially like the 'confusions' area
> > > -- that
> > > > >>> will make this quite useful.
> > > > >>>
> > > > >>> In this, it would be helpful if you'd explain what datasets
> > X and Y
> > > > >>> might be. Particular collections? Subcollections of a larger
> > whole?
> > > > >>> "in some cases records in a dataset are themselves used as
> > > values in
> > > > >>> other datasets. For example, Derrida wrote a book that
> > comments on
> > > > >>> Heidegger's book "Sein und Zeit". A record for Derrida's book
> > > in dataset
> > > > >>> X can state this by relating it to a record for Heidegger's
> > book in
> > > > >>> dataset Y. This statement in the Derrida record could consist
> > > of the
> > > > >>> Dublin Core Subject with as value a reference to the Heidegger
> > > record.
> > > > >>> In this case we would still term X and Y datasets, not a value
> > > > >>> vocabularies."
> > > > >>>
> > > > >>> -Jodi
> > > > >>>
> > > > >>> On 6 Jan 2011, at 08:00, Mark van Assem wrote:
> > > > >>>
> > > > >>>
> > > > >>>> Hi all,
> > > > >>>>
> > > > >>>> As per my action I have written some text [1] to explain
> > the terms
> > > > >>>> "dataset, metadata element set, value vocabulary" with
> > > feedback from
> > > > >>>> Karen and Antoine to address the things that don't fit very
> > > nicely.
> > > > >>>>
> > > > >>>> Please let me know what you think, after I've had your input
> > > we'll put
> > > > >>>> it on the public list to get shot at.
> > > > >>>>
> > > > >>>> Mark.
> > > > >>>>
> > > > >>>> [1]
> > > > >>>>
> > >
> >
> http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Vocabularies.2C_Element_sets.2C_Datasets
> > > > >>>>
> > > > >>>>
> > > > >>>> On 28/12/2010 18:40, Karen Coyle wrote:
> > > > >>>>
> > > > >>>>> I have been organizing the vocabularies and technologies
> > on the
> > > > >>>>> archives
> > > > >>>>> cluster page [1] and it was a very interesting exercise
> > trying to
> > > > >>>>> determine what category some of the "things" fit into. This
> > > could turn
> > > > >>>>> out to be a starting place for our upcoming discussion
> of our
> > > > >>>>> definitions since it has real examples. The hard part seems
> > > to be value
> > > > >>>>> vocabularies v. datasets, and I have a feeling that there
> > > will not be a
> > > > >>>>> clear line between them.
> > > > >>>>>
> > > > >>>>> kc
> > > > >>>>> [1]
> > > > >>>>>
> > > > >>>>>
> > >
> >
> http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Archives#Vocabularies_and_Technologies
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > > > --
> > > > =====
> > > > Emmanuelle Bermès - http://www.bnf.fr
> > > > Manue - http://www.figoblog.org
> > > >
> > >
> > >
> > >
> > > --
> > > Karen Coyle
> > > kcoyle@kcoyle.net http://kcoyle.net
> > > ph: 1-510-540-7596
> > > m: 1-510-435-8234
> > > skype: kcoylenet
> > >
> > >
> > >
> >
>
Received on Monday, 17 January 2011 14:38:56 UTC