W3C home > Mailing lists > Public > public-lld@w3.org > March 2011

RE: Question about MARCXML to Models transformation

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Sun, 06 Mar 2011 16:56:26 -0800
Message-ID: <20110306165626.486054ukxj0vschm@kcoyle.net>
To: "Young,Jeff (OR)" <jyoung@oclc.org>
Cc: public-lld@w3.org
OK, now I think I get part of it!

I don't know that anyone has figured out which properties function for  
the deduping of FRBR-zed descriptions, but I agree that it is  
complicated by the WEM structure. The "fully relational database"  
theory behind it (as seen in scenario 1) would have one instance of  
each work (W), one instance of each expression of a work (W+E), and  
one stance of each manifestation of a work (W+E+M). Getting there is  
something else. It seems that we would have to use both the properties  
and the relationships with WEM entities as well as relationships to at  
least the Group 2 entities to complete the deduping. For example,  
there will undoubtedly be many thousands of Expression entities that  
consist of little more than:

hasLanguage / "English"

but these are not the same expression. The Expression is defined by  
its own properties AND the Work it is an expression of; as you said  
earlier, there is really no such thing as an Expression, just an  
Expression->Work unit. (I awkwardly tried to describe some of this in  
a blog post a while back. [1])

The Open Library merges Works and "Editions" which are basically like  
MARC records, so there isn't an Expression and the Work information is  
partially replicated in the edition record. That is much easier than  
also including the expression, since I think it will be very hard to  
algorithmically determine expressions. (I'm not sure there will be  
enough data to do so.)

Based on the talk he gave last year in Cologne (which I only partially  
understood because it was in German -- that is, I looked at the  
pictures), it seems that Alex may have done some thinking about this.  
True, Alex? (If you're reading this.)

kc
[1] http://kcoyle.blogspot.com/2010/05/frbr-and-sharability.html

Quoting "Young,Jeff (OR)" <jyoung@oclc.org>:

> Let's flip "splits" on its head and imagine "merges" instead
> (owl:sameAs). Imagine that I create stand-alone W-E-M triads for every
> single Manifestation and then I go back and try to "de-dup" them
> class-by-class. Which FRBR properties are useful for this purpose, and
> which ones are along for the ride?
>
> If that doesn't help, I can try again.
>
>> -----Original Message-----
>> From: Karen Coyle [mailto:kcoyle@kcoyle.net]
>> Sent: Sunday, March 06, 2011 6:21 PM
>> To: Young,Jeff (OR)
>> Cc: public-lld@w3.org
>> Subject: RE: Question about MARCXML to Models transformation
>>
>> Quoting "Young,Jeff (OR)" <jyoung@oclc.org>:
>>
>> >
>> > I'm still trying to make sense of WEMI, but treating "has publisher"
>> and
>> > "place of publication" as literals implies they have no bearing on
>> WEMI
>> > splits. If these properties aren't factors, it makes me wonder which
>> if
>> > any are. It never occurred to me that WEMI entity production
> wouldn't
>> > leave traces in the properties. Maybe I've been looking under the
>> wrong
>> > rocks?
>>
>> Jeff, you completely lost me on this, so I'm going to begin by asking
>> what you mean by "splits" -- then I probably will have other
>> questions. :-)
>>
>> kc
>>
>>
>> >
>> > Jeff
>> >
>> >> -----Original Message-----
>> >> From: Tillett, Barbara [mailto:btil@loc.gov]
>> >> Sent: Sunday, March 06, 2011 3:44 PM
>> >> To: Young,Jeff (OR); Karen Coyle; Thomas Baker
>> >> Cc: gordon@gordondunsire.com; public-lld@w3.org
>> >> Subject: RE: Question about MARCXML to Models transformation
>> >>
>> >> I basically agree, but want to point out that FRBR's WEMI are not
>> >> strictly hierarchical but rather a network graph (don't forget
> about
>> >> the many to many relationships for the WEMI - it's not just one to
>> one
>> >> or one to many or many to one - there are also many to many).
>> >>
>> >> Also "relational database" does not mean it has relationships...it
>> >> means it's based on relational algebra with joins, unions,
>> >> intersections, etc., of tables (sets of data).  I'm really looking
>> >> forward to breaking away from relational database models to get to
>> >> something that handles the complex graph structures of the
>> >> bibliographic universe better.  It's probably because I'm rather
>> fond
>> >> of topological spaces and non-Euclidean geometries and see a better
>> > fit
>> >> in that realm, but computer science isn't there yet.  I think the
>> >> Semantic Web has the potential to free us from the relational
> model,
>> >> while improving connections and links of relationships...but I
> still
>> >> see current iterations as not really "there" yet.  Gordon's work is
>> a
>> >> brilliant step to demonstrating and documenting the logic relations
>> >> (transitive, equivalent, etc.), cardinalities, etc.  It really
> helps
>> > us
>> >> "see" the model and note where adjustments would make it even
>> better.
>> >>
>> >> FRBR has declared certain attributes for the entities, and I
>> > completely
>> >> agree some of those could better evolve into relationships (like
>> >> corporate bodies with a relationship/role of "is publisher" to a
>> >> particular manifestation rather than leaving them as attributes of
> a
>> >> manifestation) - we started to do that with RDA, but stopped short
>> as
>> >> being too drastic a change from FRBR for this first round...but I
> am
>> >> sure it will be revisited once we have more registries like VIAF
> and
>> >> the RDA registries that make linking and declaration of
>> relationships
>> >> easier and more stable, and schemas and systems that can actually
> do
>> >> something with such structures. - Barbara
>> >> ________________________________________
>> >> From: public-lld-request@w3.org [public-lld-request@w3.org] On
>> Behalf
>> >> Of Young,Jeff (OR) [jyoung@oclc.org]
>> >> Sent: Sunday, March 06, 2011 4:15 AM
>> >> To: Karen Coyle; Thomas Baker
>> >> Cc: gordon@gordondunsire.com; public-lld@w3.org
>> >> Subject: RE: Question about MARCXML to Models transformation
>> >>
>> >> I think Karen brings some nebulous issues into focus. Sorry if my
>> >> thoughts are cryptic. I can try to clarify them if needed.
>> >>
>> >> > It's rather clear that FRBR was not designed with the open world
>> >> model
>> >> > in mind -- in fact, it was designed around a late 90's concept of
>> >> > relational databases.
>> >>
>> >> The Semantic Web is also "relational", so that aspect doesn't
> bother
>> >> me.
>> >> I agree that "relational databases" impose closed world
> assumptions,
>> >> but
>> >> I'm not sure this limitation affects how designers go about their
>> >> modeling. For example, reusable OWL can be rationalized from legacy
>> >> relational databases using D2RQ:
>> >>
>> >> http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/
>> >>
>> >> > It is very top-down in that XML-ish way and most
>> >> > commonly it is assumed that each of the FRBR entities will be a
>> >> > record.
>> >>
>> >> FRBR in general is relational, but the WEMI classes specifically
> are
>> >> unquestionably hierarchical. I would agree that XML Schemas warps
>> our
>> >> thinking, but WEMI is starting to make sense to me as a hierarchy.
>> My
>> >> complaint now is the lack of meaningful WEMI subclasses that could
>> > make
>> >> the model much easier to understand and deal with.
>> >>
>> >> > I say that latter because of the fact that the WEMI entities,
>> >> > while having inter-dependencies, also have specific relationships
>> to
>> >> > other WEMI entities (as well as to the group 2 and 3 entities).
> So
>> > an
>> >> > expression will have a relationship to a work and to one or more
>> >> > manifestations -- that's what I think of as a *structural*
>> >> > relationship --
>> >>
>> >> I agree with this interpretation and provide these RDF examples for
>> >> illustration.
>> >> (Beware: my "frbr" namespace elements are ad hoc.)
>> >>
>> >> <expression-1> a frbr:Expression ;
>> >>         frbr:isARealizationOf <work-1> ;
>> >>         frbr:isEmbodiedIn <manifestation-1> ;
>> >>         frbr:isEmbodiedIn <manifestation-2> .
>> >> <work-1> a frbr:Work .
>> >> <manifestation-1> a frbr:Manifestation .
>> >> <manifestation-2> a frbr:Manifestation .
>> >>
>> >> > but it can also have bibliographic relationships to
>> >> > other expressions (like: one expression is the translation of
>> > another
>> >> > expression, or is an updated edition).
>> >>
>> >> Here's what the additional triples would look like:
>> >>
>> >> <expression-1>
>> >>         frbr:hasATranslation <expression-2> ;
>> >>         frbr:hasARevision <expression-3> .
>> >> <expression-2> a frbr:Expression .
>> >> <expression-3> a frbr:Expression .
>> >>
>> >> > The fact is that it will be very hard to have an expression
>> without
>> > a
>> >> > work because of the way the properties are spread across the
> Group
>> 1
>> >> > entities: an expression does not have relationship to a primary
>> >> > creator (e.g. author), only a work does. Ditto subjects: only
> Work
>> >> > entities have the "has subject" property that links to topical
>> >> > entities.
>> >>
>> >> I'm willing to go so far as believing it is *impossible* to have an
>> >> Expression without a Work because *all* conceivable Expressions
> have
>> >> creator and subject relationships in theory: even the fictional
>> ones.
>> > I
>> >> think we need to beware that FRBR doesn't strive to be a metadata
>> >> exchange format, it strives to be a model of common sense reality
>> > (more
>> >> or less).
>> >>
>> >> > A Manifestation doesn't have a language of text; that
>> >> > belongs to the Expression. The necessary elements to describe a
>> >> > resource
>> >>
>> >> Riddle: When is a resource not a resource?
>> >> Answer: When the modeler(s) declare it to be a property or set of
>> >> properties instead.
>> >>
>> >> Fortunately, no modeler in history ever had the last word. :-)
>> >>
>> >> > are spread across the 3 (WEM) group 1 entities, making it
>> >> > very difficult to treat them separately. To give you an idea of
>> what
>> >> > each entity "means", here are some key attributes for each:
>> >> >
>> >> > Work
>> >> >   - work title
>> >> >   - key for a musical work
>> >> >   - coordinates for a cartographic work
>> >> >   - with relationships to
>> >> >      -- creator of the work
>> >> >      -- topics of the work (subject headings and classifications)
>> >>
>> >> The terms "musical work", "cartographic work", and various other
>> >> rationalized "foo work" qualifiers imply subclasses of FRBR Work. I
>> >> think it's worth attempting.
>> >>
>> >> >
>> >> > Expression
>> >> >   - language of the expression (if text)
>> >> >   - form of the expression (text, sound, image)
>> >>
>> >> Likewise, "text expression", "sound expression", "image
> expression",
>> >> and
>> >> other qualifications all imply subclasses of FRBR Expression.
>> >>
>> >> > Manifestation
>> >> >   - title of the manifestation (may be different to the work
>> title)
>> >> >   - edition
>> >> >   - publisher, date of publication
>> >> >   - physical format (size, units, other measurements)
>> >> >   - ISBN, ISSN, etc.
>> >>
>> >> My feeling is that some of these "attributes"
> (owl:DatatypeProperty)
>> >> SHOULD be modeled as relationships/associations instead
>> >> (owl:ObjectProperty). For example, I think "publishers" should be
>> >> modeled as a frbr:CorporateBody (or a subclass thereof) and "place
>> of
>> >> publication" should be modeled as frbr:Place. Limiting the
>> individuals
>> >> in the CorporateBody and Place classes to known subjects of a Work
>> >> doesn't make sense in an open world model. Most real world objects
>> can
>> >> be dumbed-down to literals when necessary.
>> >>
>> >> >
>> >> > There are many more attributes, but these are the common ones and
>> > the
>> >> > ones that I think may help people understand the issue. The data
>> >> > record that libraries create today contains data elements from
> all
>> > of
>> >> > these entities, mixed together and usually not clearly identified
>> as
>> >> W
>> >> > or E or M. To create library data under FRBR it will be necessary
>> to
>> >> > ALWAYS have Work+Expression+Manifestation entities. (I'm skipping
>> >> Item
>> >> > in the interest of brevity, but we should assume that it is part
>> of
>> >> > the picture.)
>> >>
>> >> For better or worse it's not that simple. As Tom Baker pointed out
>> in
>> >> another thread, ontologies aren't exchange formats, they are models
>> in
>> >> which some entities can be inferred.
>> >>
>> >> >
>> >> > Now, it would be great to investigate the inferences that one can
>> >> make
>> >> > with FRBR. For example, if you say:
>> >> >
>> >> > resourceA / frbrer:hasSubject /
>> >> > http://id.loc.gov/authorities/sh85148177
>> >> >
>> >> > then the inference is that resourceA is a Work. (I believe the
> way
>> > to
>> >> > say this is that "hasSubject" has the domain "Work". Right,
>> Gordon?)
>> >>
>> >> FRBRer coins separate "has as subject" properties for each range
>> > class,
>> >> but as you would expect the domain is always Work.
>> >>
>> >> > You cannot then say:
>> >> >
>> >> > resourceA / frbrer:hasPublisher / "Random House"
>> >> >
>> >> > because *that* statement would mean that resourceA is a
>> >> Manifestation,
>> >> > and Manifestation and Work are disjoint.
>> >>
>> >> The FRBRer OWL doesn't currently declare Work and Expression to be
>> >> owl:disjointWith one another, but I think that was Gordon's plan.
>> >> Here's
>> >> some support for your understanding:
>> >> http://www.w3.org/TR/owl2-primer/#Class_Disjointness.
>> >>
>> >> > So in a sense you are forced
>> >> > (whether OWL forces you or not is another question), but the FRBR
>> >> > logic forces you to create a new entity for the Manifestation
>> >> > *portion* of your description. In addition, to connect the
>> >> > Manifestation to the Work (since you need the creator and
> subjects
>> > to
>> >> > complete your description), you may need to create an entity for
>> the
>> >> > Expression. (RDA allows Manifestations to "Manifest" Works, but I
>> >> > think FRBR in its present state still requires M -> E -> W.)
>> >>
>> >> I believe it's possible to create an inferred shortcut like this in
>> >> OWL,
>> >> but it's just a convenience property.
>> >>
>> >> >
>> >> > This is, of course, unless I have totally missed something in the
>> >> > nature of FRBR, and if so I would love to hear that my worst
> fears
>> >> > about it do not come to bear.
>> >>
>> >> I think you've created a useful and accurate summary. :-)
>> >>
>> >> Jeff
>> >>
>> >> >
>> >> > kc
>> >> >
>> >> > >
>> >> > > It relates to Dan's point that schema designers in the new
>> >> > > idiom are not actually issuing "shipping orders" for data
>> >> > > integrity in the imperative style to which they are accustomed
>> >> > > -- even if, as I suspect, they may sometimes _believe_ that
>> >> > > this is is the effect of declarations such as the above.
>> >> > >
>> >> > > As Jeff has pointed out, one might conceivably use the OWL to
>> >> > > construct syntactic validators to impose such data integrity,
>> >> > > but these are necessarily over and above whatever the OWL
>> >> > > itself actually says.
>> >> > >
>> >> > > Tom
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Karen Coyle
>> >> > kcoyle@kcoyle.net http://kcoyle.net
>> >> > ph: 1-510-540-7596
>> >> > m: 1-510-435-8234
>> >> > skype: kcoylenet
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Karen Coyle
>> kcoyle@kcoyle.net http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
>>
>
>
>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Monday, 7 March 2011 00:57:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 March 2011 00:57:02 GMT