Re: vocabs, metadata set, datasets from Thomas Baker on 2011-01-21 (public-xg-lld@w3.org from January 2011)

From: Thomas Baker <tbaker@tbaker.de>
Date: Fri, 21 Jan 2011 13:13:37 -0500
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: "gordon@gordondunsire.com" <gordon@gordondunsire.com>, public-xg-lld@w3.org
Message-ID: <20110121181337.GA4912@octavius>

On Fri, Jan 21, 2011 at 07:57:25AM -0800, Karen Coyle wrote:
> First, the FRBR entities of Group 1 are modeled as separate records
> (unfortunately). That's something I see as problematic, but that's
> how it is. It is my impression that in each such record, all of the
> triples will have the same subject. Maybe we need to try out some
> examples and see if this is true.

Thank you for explaining that!  

I immediately wonder whether there are two (or more) ways of
understanding "record" -- one of the record as a serialized
blob held on hard disks and exchanged over the wire, and
another as something more conceptual, e.g., as a grouping of
information.  

If four records were grouped into a serialized blob, given
an identifier, and managed a particular database as a whole,
would that blob also be considered a record?  I'm not looking
for an answer, just asking the question...

> Let me make it clear that I am NOT saying that this is the right way
> to do it. I'm trying to explain current thinking, as I read it, in
> library cataloging.

Understood!  I'm trying to understand differences in underlying
assumptions so that we can articulate and explain them more
clearly.

> In my mind, the DCAM represents a full data model, not a record. The
> library world also has a data model, with 3 entity types, the three
> FRBR groups (and all groups are actually multiple entities). But
> each entity is a separate record in the instance data.

I don't want to take this thread in the direction of DCAM,
but the general idea of DCAM was to provide an abstract
syntax for the contents of a "record", as in: "Description
sets are instantiated, for the purposes of exchange between
software applications, in the form of metadata records" [1].

To the extent DCAM provides a full data model, that model is
based largely on RDF -- with the addition of named-graph-like
constructs not in RDF per se, such as Description and
Description Set.  In that sense, I see DCAM as orthogonal
to, i.e., not really comparable with, FRBR as a data model.
And yes, I acknowledge that DCAM is confusing on these
points.

> Note that library records often
> contain administrative data about the record or the creation of the
> record, and this isn't distinguished from data about the primary
> entity. Other than that I do believe that each record has a single
> focus today.

I'm willing to believe that most records _do_ have a single
focus, but administrative data is a good example.  I took
a few minutes to look up some examples of library records,
and the first one I saw had information along the lines of:

     Berlin: Springer Verlag, 1992.

...which I would be more inclined to translate into triples as:

    :X dct:date             "1992"
    :X dct:publisher        :Y
    :Y ex:name              "Springer Verlag"
    :Y ex:location          "Berlin"

...rather than as, say:

    :X dct:date             "1992"
    :X dct:publisher        "Springer Verlag"
    :X ex:publisherlocation "Berlin"

...where "Berlin" is directly an attribute of resource "W" --
which, among other things, would lose the relationship between
"Berlin" and "Springer Verlag".

Tom

[1] http://www.dublincore.org/documents/abstract-model/#sect-3

-- 
Tom Baker <tbaker@tbaker.de>

Received on Friday, 21 January 2011 18:14:16 UTC