Re: Question about MARCXML to Models transformation from Ross Singer on 2011-03-08 (public-lld@w3.org from March 2011)

From: Ross Singer <ross.singer@talis.com>
Date: Tue, 8 Mar 2011 10:06:01 -0500
To: "Tillett, Barbara" <btil@loc.gov>
Cc: "Young,Jeff (OR)" <jyoung@oclc.org>, Thomas Baker <tbaker@tbaker.de>, Karen Coyle <kcoyle@kcoyle.net>, "Diane I. Hillmann" <dih1@cornell.edu>, "public-lld@w3.org" <public-lld@w3.org>
Message-ID: <AANLkTimNtYgUPhdPf_TJOEP5KaAvH1Savt_cG+sEC7xE@mail.gmail.com>
On Tue, Mar 8, 2011 at 7:39 AM, Tillett, Barbara <btil@loc.gov> wrote:

> Baconian theory is a subdivision under Shakespeare in LCSH, and as such you
> should be able to get to that "facet" from id.loc.gov - I'll see what if
> anything is preventing you from getting that.
>

"Baconian theory" isn't an authorized heading (or subdivision).  So you'd
need to do something like what they do for "Minnesota" (which falls into the
same pit as "Shakespeare, William..." until LC releases the NAF as linked
data): http://www.loc.gov/standards/mads/rdf/examples/sh2007010620.rdf (see
/RDF/Authority/componentList).  Here they use a blank node, which works,
obviously, and meets the requirements of an rdf list, but I don't see the
value in it and kicks the eventual reconciliation problem down the road.

By simply changing the list to a container, this need to contrive a heading
is unnecessary.  You would simply add the linkages when they come
available.  Since the blank node would add very little (a label, at most,
probably), what's the point?  What value does it bring?


> As for the felxibility of the model - I believe it is open to expansion and
> adaptation into different data models as needed for applications - FRBR is a
> theoretical conceptual model.... If we fid applications where there is need
> for an expression to connect to more than one work, then that application
> should build that in, and the FRBR Review Borad should be alerted to that
> fact so they can adjust the conceptual model.  Likewise, if we have a new
> term (or an old one that was overlooked) in a controlled vocabulary, there
> are already mechanisms to add and adjust terms to maintain the structures of
> that controlled vocabulary.  They aren't as cast in stone as you seem to
> imply. - Barbara
>

I'm actually talking about the opposite of this: not that the model cannot
meet the demands of our data, but that our data cannot meet the demands of
the model.  We've talked a bit about the difficulty we have had identifying
Expressions in our legacy data -- and, again, the relative lack of value a
vast majority of them actually would give us: a language, maybe.  Again, I
point to the Open Library as a pretty good real world scenario here.  They
have the notion of Work (and I would argue it conforms, more or less, with
the FRBR definition of Work) and then they munge the notion of Expression,
Manifestation (and, occasionally, Item, I guess, for digitized things) into
something they call an "Edition".  For them (and I suspect many) the value
of a perfect FRBR conceptualization of their data is far lower than the cost
it would take to actually create it.

This is not to say that the FRBR model is wrong or even necessarily flawed.
I just think that applying it verbatim to RDF through OWL with an
application profile that is intended to enforce its rules is more likely a
barrier to adoption than it is insurance of semantic interoperability.

After Karen expressed her frustrations (on another list, I think -- maybe
ol-tech) about trying to model the WEM hierarchy in RDF for Open Library, I
made a series of properties on open.vocab.org as sort of a compromise:

http://open.vocab.org/terms/commonEndeavour
http://open.vocab.org/terms/commonWork
http://open.vocab.org/terms/commonExpression
http://open.vocab.org/terms/commonManifestation
http://open.vocab.org/terms/commonItem

the point of which being that not all data is going to be modeled as FRBR
(resources modeled as BIBO, for example, ignore the distinctions of the WEMI
hierarchy) even though the FRBR relationship model is still just as
applicable.  It also circumvents the need to contrive resources based on a
lack of data by *implying* their existence without the demands of explicitly
modeling them.  When (and if!) these FRBR entities ever are realized, they
can be linked in then without any need to go back and clean up the stubs we
had to create just to get off the ground.

All I'm saying is that we need to avoid the tail wagging the dog,  There is
no reason we cannot have rich, interlinked, authoritative data, but one of
the most valuable aspects of linked data is how well suited it is to
iterative data modeling.  Say what you know now and avoid unnecessary
statements required simply because the model is forcing you to.  Add things
as you learn them and link to them.

Having a rigid, well-defined model is fine.  It helps give something to
point to and provide an exemplar to work towards.  At the same time, it
needs to allow for a loose way to refer to it, otherwise nothing will ever
get done.

-Ross.

>
> -----Original Message-----
> From: public-lld-request@w3.org [mailto:public-lld-request@w3.org] On
> Behalf Of Ross Singer
> Sent: Tuesday, March 08, 2011 12:25 AM
> To: Young,Jeff (OR)
> Cc: Thomas Baker; Karen Coyle; Diane I. Hillmann; public-lld@w3.org
> Subject: Re: Question about MARCXML to Models transformation
>
> I would say the major problem I have with these models that set the
> expectation of rigidity (e.g. "an Expression must belong to one Work, a
> Manifestation must belong to one Expression, etc.") is that implies the
> intersection of omniscience, perfection and comprehensiveness from the
> outset.
>
> The MADS/RDF's implementation of coordination also runs afoul of this (by
> using rdf lists).  The irony being that the subject authorities can't
> themselves be modeled this way without external dependencies
> (see: http://id.loc.gov/authorities/sh85120834#concept - not only does
> id.loc.gov not currently have name resources -- although, obviously they
> could -- there is no authorized heading for "Baconian theory").
>
> As Diane pointed out earlier about trying to model MARC records as "types",
> it's difficult to model the world and impossible to keep up with the changes
> that evolution brings while maintaining integrity with your backfile.
>
> While RDF's "you can only know what you're looking directly at"
> principle seems somewhat existential, it's also built on pragmatism.
> I can't help but think there's got to be some middle ground somewhere.
>  If we can agree on this sweet spot, somewhere between dogma and abandon
> (which, really, isn't as big a gulf as it seems, it's just that they're
> fundamentally disjointed) with an acknowledgement of both will dramatically
> lower the kinetic energy needed to start getting data modeled.
>
> Some of these may be fairly simple (changing MADS/RDF's coordination lists
> to rdf containers, for example), others, like abstracting away the
> strictness of FRBRer (such as implying parts of the WEMI stack, coupled with
> explicit parts elsewhere -- similar to what the Open Library does), while
> still representing a compatible data model, might be less trivial but allow
> for the creation of much more content.
>
> At some point we (and by "we" I don't necessarily mean this group, but the
> library community as a whole) need to step back and what exactly we hope to
> accomplish and how that might realistically be done.
>
> -Ross.
>
> On Mon, Mar 7, 2011 at 10:12 AM, Young,Jeff (OR) <jyoung@oclc.org> wrote:
> >
> > I half agree. The guiding light for whether something is a WEM or I
> > isn't necessarily the class name or its definition, it's the
> > sensibility of properties. WEMI is what it is because the FRBR
> > designers put careful thought into the property names separating them:
> > "is realized through", "is embodied in", and "is exemplified by".
> >
> > For example, this statement "makes sense" to me and I guessing
> > everyone else (forget FRBR for a second):
> >
> > "A newspaper editorial is a realization of a opinion."
> >
> > Is this use of "is a realization of" merely a pun or is its meaning
> > the same as that found in the FRBR model? I would argue it's the same,
> > which means (through domain/range settings) that an "Opinion" is a
> > Work (presumably in the sub-class sense) and "Newspaper Editorial" is
> > an "Expression" (also in the subclass sense).
> >
> > These subclass assignments may not be obvious in isolation, but when
> > used in statements involving properties their nature becomes clearer.
> >
> > Jeff
> >
> > > -----Original Message-----
> > > From: public-lld-request@w3.org [mailto:public-lld-request@w3.org]
> > > On Behalf Of Thomas Baker
> > > Sent: Monday, March 07, 2011 9:14 AM
> > > To: Karen Coyle
> > > Cc: Diane I. Hillmann; public-lld@w3.org
> > > Subject: Re: Question about MARCXML to Models transformation
> > >
> > > On Sun, Mar 06, 2011 at 09:35:22AM -0800, Karen Coyle wrote:
> > > > I actually think that we should emphasize the "has a" rather than
> > "is
> > > > a" aspects of the resources we describe, and let the "has a" allow
> > us
> > > > to infer any number of "is a" qualities. This is the message that
> > Jon
> > > > Phipps gave at the tutorial day at DC in Pittsburgh -- that we
> > > > describe things by their characteristics, and those
> > > > characteristics tell us what the thing *is*.
> > >
> > > Yes, that sounds right to me.  Emphasize Properties
> > > (relationships) over Classes. Verbs over nouns.  Describe things
> > > less through giving them a name -- i.e., writing a definition for a
> > > class of things to which they belong -- and more through enumerating
> > > their characteristics.
> > >
> > > --
> > > Tom Baker <tbaker@tbaker.de>
> > >
> >
> >
> >
> >
> > Please consider the environment before printing this email.
> >
> > Find out more about Talis at http://www.talis.com/ shared innovationT
> >
> > Any views or personal opinions expressed within this email may not be
> those of Talis Information Ltd or its employees. The content of this email
> message and any files that may be attached are confidential, and for the
> usage of the intended recipient only. If you are not the intended recipient,
> then please return this message to the sender and delete it. Any use of this
> e-mail by an unauthorised recipient is prohibited.
> >
> > Talis Information Ltd is a member of the Talis Group of companies and is
> registered in England No 3638278 with its registered office at Knights
> Court, Solihull Parkway, Birmingham Business Park, B37 7YB.
> >
> > Talis North America is Talis Inc., 11400 Branch Ct., Fredericksburg, VA
> 22408, United States of America.
>
>
Received on Tuesday, 8 March 2011 15:06:45 UTC