- From: Mark van Assem <mark@cs.vu.nl>
- Date: Mon, 17 Jan 2011 15:38:21 +0100
- To: "ZENG, MARCIA" <mzeng@kent.edu>
- CC: "public-xg-lld@w3.org" <public-xg-lld@w3.org>
Hi Marcia, > >mz: This is 70% correct, just needs to take off ‘metadata schema’. > Every vocabulary (such as a thesaurus) has a schema that defines the > attributes, some are universal (the thesauri world all followed > ISO2788 for a long time, now ISO25964 and BS 8723 [ref][1]) and some > are locally defined. I don't understand the argument. There definitely are cases that define their own metadata schema and do not reuse existing ones, so that would be a good reason to leave it in, even though others reuse existing ones. So I'd be in favor of leaving it in, maybe with less emphasis? > >mz: [continue from above}: all value vocabularies have their own set > of attributes. TGN and other Getty vocabularies have their > standardized attributes and a number of controlled lists (e.g., <snip> Hm... does this mean that you agree with my standpoint that MeSH and Getty are value vocabs, or do you disagree? > I've tried to cover this problem through the "Confusions" points. If > they do not succeed in doing this, what would you add/remove in the text > to fix this? > >>mz: I am providing the following suggestions for “Value Vocabularies” > Confusions part. I've used your suggestions to draft a new text, with slightly different emphasis, hope it is OK now! Thanks! Mark > Before: a value vocabulary often also defines metadata elements. For > example, GeoNames defines elements for coordinates, names and postal > codes of places. These can be referred to as the GeoNames metadata > elements. Similarly, VIAF defines elements to describe authorities > (corporations, people). > > After: A value vocabulary often employs a schema that is derived from a > model underlying its data structure. Some of the models are universal > and have been defined in international and national standards, e.g., for > thesauri [ref], while others are implementation-specific or yet to > become widely-adopted. For example, GeoNames defines elements for > coordinates, names and postal codes of places. Similarly, VIAF defines > elements to describe corporate bodies and people. > > Confusion #2 > Before: We classify VIAF and GeoNames as value vocabularies instead of > datasets because they are used (or are meant to be used) extensively as > value vocabularies in record collections, while their metadata elements > are not widely reused (as are DC elements). We acknowledge that this > distinction is dependent on the role that the dataset/vocabulary plays > instead of its inherent characteristics. Our viewpoint is indeed > debatable, but sufficient for the purposes of our report > > After: We classify VIAF and GeoNames as value vocabularies instead of > datasets because they are used (or are meant to be used) extensively as > value vocabularies in building other record collections datasets. This > distinction is dependent on the role that the dataset/vocabulary plays > instead of its inherent characteristics. > > [ref] > ISO 2788Documentation -- Guidelines for the establishment and > development of monolingual thesauri. 1974, 1986. > ISO 25964 Part 1 Thesauri and Interoperability with Other Vocabularies. > Clause 15. Data model. 2010. > BS 8723: Structured Vocabularies for Information Retrieval. Part 5. > Exchange formats and protocols for interoperability. 2008 > > [1] http://schemas.bs8723.org/Model.aspx > > > If I still didnt get your point I apologize! > Mark. > > Hope this helps. Thanks. > Marcia > > Op 7-1-2011 16:16, ZENG, MARCIA schreef: > > Mark, > > Re: your question > > > >Re Marcia's point [["For example, in digital gazetteers not only the > > > > place names are controlled but also the place features, type, > > coordinates, and even maps are included."]] > > > > > >I'm not sure I get what you mean with the "also controlled", > > > > I am giving the following text to explain further [ref]: > > > > 1.Concept of a geographic place is fuzzy (e.g., Rocky Mountains) > and we > > use place names differently according to the circumstances (e.g., > using > > “Santa Barbara” generally to mean the whole general area or > specifically > > to mean just the incorporated city area.) > > 2.When locations are named, they can be in a gazetteer. A place > can have > > more than one name: name variants, name in different languages, etc. > > 3.In a geospatially referenced gazetteer, each entry have a > “footprint” > > consisting of latitude and longitude coordinates. This footprint > can be > > a point (most current gazetteer footprints are points)... > > 4.Each entry in a digital gazetteer must also be categorized according > > to a formal typing system (a controlled vocabulary of type > terminology). > > > > #2 is what most thesauri would do, to control the synonyms and > equivalents. > > #3 is especially the approach used in a thesaurus to eliminate > > ambiguities. But here they are not like a GPS which focuses on > > coordinates and use bounding boxes to provide a precise location. > These > > points in a gazetteer are more as a qualifier to provide context of a > > place. > > #4 is to provide a TYPE for each named place. This is similar to the > > Medical Subject Headings where each concept is giving a TYPE code > > according to a formal typing system (see example [1]). In the Getty > > Thesaurus of Geographical Names place types are also an important > > component in each entry. Those TYPE values are usually are from from a > > controlled vocabulary.[2] So they could use other building blocks. > > However the general function and purpose of the digital gazetteer > is, as > > a “spatial dictionary of named and typed places”. > > > > Quite a lot project have used ADL gazetteers as value > vocabularies, but > > the gazetteers is also used as a reference itself, e.g., [3]. > > Marcia > > > > [1] > > > http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded > <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded> > > > <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded > <http://www.nlm.nih.gov/cgi/mesh/2011/MB_cgi?mode=&index=8264&view=expanded>> > > > > [2] > http://www.alexandria.ucsb.edu/~lhill/FeatureTypes/ver070302/index.htm > > [3] http://clients.alexandria.ucsb.edu/globetrotter/ (try to find a > > place then see the catalog record.) > > [Ref] JCDL 2002 NKOS Workshop on Digital Gazetteers. > > http://nkos.slis.kent.edu/DL02workshop.htm > > > > > > On 1/7/11 5:05 AM, "Mark van Assem" <mark@cs.vu.nl> wrote: > > > > Thanks all for the feedback! > > > > I've tried to address all your points in de value vocab description: > > > > - "A dataset is a collection of structured metadata records" > > > > - added some more "similar terms", including KOS, gazetteer, authority > > file, concept scheme > > > > - "They are "building blocks" with which metadata records can be > built." > > > > Re Marcia's point [["For example, in digital gazetteers not only the > > place names are controlled but also the place features, type, > > coordinates, and even maps are included."]] > > > > I'm not sure I get what you mean with the "also controlled", but I > think > > indeed that this is the same as the VIAF situation: the values in a > > value vocabulary can be described with elements and values themselves, > > which would make them "datasets" also. However, we can still see > VIAF as > > a value vocab and not a dataset, as its main role is to be a building > > block for metadata records. > > > > Mark > > > > > > Op 6-1-2011 18:15, ZENG, MARCIA schreef: > > > I like the way Karen used in terms of building block or not... Also > > > agree with Jeff’s use of SKOS ‘concept scheme’ to define VIAF. > > > > > > * Regarding ‘data sets’: To me, the ‘data sets’ we are talking about > > > are structured data. Outside in other places ‘data sets’ could be > > > un-structured or semi-structured data (e.g., data.gov’s raw data > > > sets). > > > * Regarding ‘value vocabularies’: In the conventional way we have > > > used “knowledge organization systems (KOS)” for concept schemes > > > (broader than “controlled vocabularies”). Most of the vocabulary > > > types are clear such as pick lists, taxonomies, thesauri, subject > > > headings. But there is a group of ‘metadata-like’ KOS such as > > > authority files and digital gazetteers. They are/can be > > > constructed as thesauri (e.g., The Getty Thesaurus of Geographic > > > Names (TGN) and Union List of Artist Names (ULAN)). Or, they can > > > be in other structures. It is the contents they include that made > > > them also be referred to ‘data sets’. For example, in digital > > > gazetteers not only the place names are controlled but also the > > > place features, type, coordinates, and even maps are included. > > > Digital gazetteers can be used alone as data sets or be the value > > > vocabularies used in structured data sets. This might be like the > > > VIAF situation, depending on how it is constructed or on how it is > > > used. > > > > > > My 2 cents. > > > Marcia > > > > > > On 1/6/11 11:37 AM, "Karen Coyle" <kcoyle@kcoyle.net> wrote: > > > > > > Quoting Emmanuelle Bermes <emmanuelle.bermes@bnf.fr>: > > > > > > > > > > As for myself, I do have a few more comments : > > > > - I think the emphasis on value vocabs is too important in the > > current > > > > definition of dataset. It's actually creating confusion, in my > view. > > > > - I'm wondering if we could use the term "instance" (a dataset > is a > > > > collection of instance descriptions) or is it too implementation > > > oriented ? > > > > > > > > > > > > > I'm not sure that the term "instance" will work -- even a value in a > > > list could be considered an instance, no? > > > > > > Somehow, the concept for a dataset is that it consists of the > > > descriptions of entities that you need for an application or > function, > > > rather than the building blocks for creating such a description. > > > (Which gets back to Mark's statement about "A record for Derrida's > > > book in dataset X ...") > > > > > > Essentially, one person's dataset could be another person's building > > > block. But I think the key is that a dataset is complete for an > > > application, while a value vocabulary needs to be combined with > other > > > data to be useful. > > > > > > No, I'm not satisfied with that explanation... I'll ruminate on this > > > and see if I can find better words. > > > > > > kc > > > > > > > Emmanuelle > > > > > > > > On Thu, Jan 6, 2011 at 5:13 PM, Mark van Assem <mark@cs.vu.nl> > > wrote: > > > > > > > > > Hi Emma, > > > > > > > > > > I saw you had already followed up on our action to clarify > "value > > > > > vocabularies". > > > > > > > > > > I saw that you think we should clarify how value vocabularies > > > actually > > > > > appear in metadata records (as literals, codes, identifiers). > > > While I kinda > > > > > feel we should try to stay agnostic to that I kept it in, but > > > rewrote it > > > > > slightly: > > > > > > > > > > "In actual metadata records, the values used can be literals, > > > codes, or > > > > > identifiers (including URIs), as long as these refer to a > > > specific concept > > > > > in a value vocabulary. " > > > > > > > > > > I also moved your point re "closed list" up to the initial > > > definition; this > > > > > is indeed central to what a value vocab is. > > > > > > > > > > Mark. > > > > > > > > > > > > > > > On 06/01/2011 16:34, Mark van Assem wrote: > > > > > > > > > >> Hi Jodi, > > > > >> > > > > >> X and Y would be two collections ("datasets") from two > different > > > > >> libraries. It could also be two subcollections or within one > > > collection, > > > > >> but I think making them separate ones will make it more > > > illustrative. > > > > >> > > > > >> Do you have a suggestion on how to clarify or replace X and Y > > with > > > > >> specific existing collections/libraries as examples? > > > > >> > > > > >> Mark > > > > >> > > > > >> > > > > >> On 06/01/2011 16:21, Jodi Schneider wrote: > > > > >> > > > > >>> Thanks for this, Mark! I especially like the 'confusions' area > > > -- that > > > > >>> will make this quite useful. > > > > >>> > > > > >>> In this, it would be helpful if you'd explain what datasets > > X and Y > > > > >>> might be. Particular collections? Subcollections of a larger > > whole? > > > > >>> "in some cases records in a dataset are themselves used as > > > values in > > > > >>> other datasets. For example, Derrida wrote a book that > > comments on > > > > >>> Heidegger's book "Sein und Zeit". A record for Derrida's book > > > in dataset > > > > >>> X can state this by relating it to a record for Heidegger's > > book in > > > > >>> dataset Y. This statement in the Derrida record could consist > > > of the > > > > >>> Dublin Core Subject with as value a reference to the Heidegger > > > record. > > > > >>> In this case we would still term X and Y datasets, not a value > > > > >>> vocabularies." > > > > >>> > > > > >>> -Jodi > > > > >>> > > > > >>> On 6 Jan 2011, at 08:00, Mark van Assem wrote: > > > > >>> > > > > >>> > > > > >>>> Hi all, > > > > >>>> > > > > >>>> As per my action I have written some text [1] to explain > > the terms > > > > >>>> "dataset, metadata element set, value vocabulary" with > > > feedback from > > > > >>>> Karen and Antoine to address the things that don't fit very > > > nicely. > > > > >>>> > > > > >>>> Please let me know what you think, after I've had your input > > > we'll put > > > > >>>> it on the public list to get shot at. > > > > >>>> > > > > >>>> Mark. > > > > >>>> > > > > >>>> [1] > > > > >>>> > > > > > > http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Vocabularies.2C_Element_sets.2C_Datasets > > > > >>>> > > > > >>>> > > > > >>>> On 28/12/2010 18:40, Karen Coyle wrote: > > > > >>>> > > > > >>>>> I have been organizing the vocabularies and technologies > > on the > > > > >>>>> archives > > > > >>>>> cluster page [1] and it was a very interesting exercise > > trying to > > > > >>>>> determine what category some of the "things" fit into. This > > > could turn > > > > >>>>> out to be a starting place for our upcoming discussion > of our > > > > >>>>> definitions since it has real examples. The hard part seems > > > to be value > > > > >>>>> vocabularies v. datasets, and I have a feeling that there > > > will not be a > > > > >>>>> clear line between them. > > > > >>>>> > > > > >>>>> kc > > > > >>>>> [1] > > > > >>>>> > > > > >>>>> > > > > > > http://www.w3.org/2005/Incubator/lld/wiki/Cluster_Archives#Vocabularies_and_Technologies > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > -- > > > > ===== > > > > Emmanuelle Bermès - http://www.bnf.fr > > > > Manue - http://www.figoblog.org > > > > > > > > > > > > > > > > -- > > > Karen Coyle > > > kcoyle@kcoyle.net http://kcoyle.net > > > ph: 1-510-540-7596 > > > m: 1-510-435-8234 > > > skype: kcoylenet > > > > > > > > > > > >
Received on Monday, 17 January 2011 14:38:56 UTC