Re: vocabs, metadata set, datasets from Thomas Baker on 2011-01-17 (public-xg-lld@w3.org from January 2011)

From: Thomas Baker <tbaker@tbaker.de>
Date: Mon, 17 Jan 2011 12:03:34 -0500
To: Mark van Assem <mark@cs.vu.nl>
Cc: public-xg-lld@w3.org
Message-ID: <20110117170334.GA2928@octavius>
On Mon, Jan 17, 2011 at 04:56:23PM +0100, Mark van Assem wrote:
> >-- I'd like for us to say more explicitly, up-front,
> >    that we are referring to things by these three handles --
> 
> I added [[
> Note that many standards, such as LCSH, might be seen as belonging
> to several of the categories below. However, we will refer in our
> report to each standard as belonging to just one of the categories,
> based on their typical usage.]]
> 
> does that address your concern?

Instead of:

    Note that many standards, such as LCSH, might be seen as
    belonging to several of the categories below. However, we
    will refer in our report to each standard as belonging to
    just one of the categories, based on their typical usage.

...which does not say how how is it they can belong to several
categories, perhaps the text could put more emphasis on the
"context in which it is used":

    Note that many standards, such as the Library of Congress
    Subject Headings [link], could be seen as falling under
    several of the categories below depending on the context
    in which they are used. In this report, we assign standards
    to categories based on their "typical" usage.

> >-- I'm wondering if Dataset is simply a superset of Metadata
> 
> How does this help in our definitions? Do you imply a particular
> change in the definition of dataset that would make things more
> easily understandable?

Maybe move the entry for Dataset to follow the entries for
Element Sets and Vocabularies.

Under Confusions, then, start with a new point, something
like:

-- As "sets of structured metadata", Element Sets and
   Vocabularies could be seen as datasets.  This report
   makes a pragmatic, usage-based distinction between sets
   of structured metadata specifically about Elements or
   Concepts and sets of structured metadata about all other
   sorts of things in the world (here called Datasets).

> >-- I'm slightly bothered by the emphasis -- particularly (but not
> >    only) in the definition of Dataset -- on the notion of a
> >    "structured metadata record".  By this criterion, I'm
> >    guessing that many of the nodes in the Linked Open Data
> >    cloud would not qualify as Datasets simply because the
> >    data, while possibly derived from records, does not, when
> >    expressed as triples, consist explicitly of "records".
> 
> That's a feature of the semantic web, but not a feature of how most
> library information systems are organized, right?
> 
> Put differently: whom among our readership will get this problem
> (i'd guess not many?)
> 
> Moreover, the structured metadata record "works" in the explanation,
> while adding the triple perspective here would muddle things, unless
> you have a good suggestion which I can't see!

It works well enough for drawing an analogy, but I wouldn't
want to paper over the problem, especially the bit about
records being about "one entity (e.g. a book)" -- which is
in my opinion simply wrong because a typical catalog record,
for example, contains descriptive elements not just about a
book, but its author, publisher, etc.

Instead of:

    A dataset is a collection of structured metadata records,
    describing e.g. the books in a library. Each record is
    basically a collection of statements about that one entity
    (e.g. a book), where each statement consists of an element
    ("attribute" or "relationship") of the entity, and a
    "value" for that element. 

I'd suggest something like:

    A dataset is a collection of structured metadata --
    descriptions of things, such as books in a library.
    Library records consist of statements about things,
    where each statement consists of an element ("attribute"
    or "relationship") of the entity, and a "value" for that
    element.  Note that in the Linked Data context, Datasets do
    not necessarily consist of clearly identifiable "records"
    (see entry on Records).

> >-- I'm thinking that the Library Terminology page might
> >    therefore include an entry on records, citing some of the
> 
> <snip>
> 
> that sounds like a useful idea.

Gordon has some great presentations about calling the record
paradigm into question, even "exploding the record" (or words
to that effect).  One of those could perhaps provide a good
starting point for an entry on records.

Tom

> >    key definitions of "record" used in library science.  That
> >    entry could be the place where the notion that a record is
> >    "basically a collection of statements about ... one entity"
> >    is called into question (by pointing out that in practice, records
> >    typically include some description about several entities).
> >    It could also provide a place to discuss the notion that
> >    descriptive metadata, in a Linked Data context, is primarily
> >    about description at the statement level, which is indeed
> >    what lends it so well to linking and recombination.  That
> >    entry could acknowledge the role of records in traditional
> >    library science of providing a context for the provenance of
> >    metadata and perhaps flag this as a crucial issue for Linked
> >    Data (and RDF generally).
> >
> >Tom
> >
> >[1] http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Vocabularies.2C_Element_sets.2C_Datasets
> >[2] http://lists.w3.org/Archives/Public/public-lld/2010Dec/0023.html
> >

-- 
Tom Baker <tbaker@tbaker.de>
Received on Monday, 17 January 2011 17:04:14 UTC