RE: Question about MARCXML to Models transformation from Young,Jeff (OR) on 2011-03-06 (public-lld@w3.org from March 2011)

From: Young,Jeff (OR) <jyoung@oclc.org>
Date: Sun, 6 Mar 2011 04:15:51 -0500
To: "Karen Coyle" <kcoyle@kcoyle.net>, "Thomas Baker" <tbaker@tbaker.de>
Cc: <gordon@gordondunsire.com>, <public-lld@w3.org>
Message-ID: <52E301F960B30049ADEFBCCF1CCAEF590BB0247B@OAEXCH4SERVER.oa.oclc.org>
I think Karen brings some nebulous issues into focus. Sorry if my
thoughts are cryptic. I can try to clarify them if needed.

> It's rather clear that FRBR was not designed with the open world model
> in mind -- in fact, it was designed around a late 90's concept of
> relational databases.

The Semantic Web is also "relational", so that aspect doesn't bother me.
I agree that "relational databases" impose closed world assumptions, but
I'm not sure this limitation affects how designers go about their
modeling. For example, reusable OWL can be rationalized from legacy
relational databases using D2RQ:

http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/

> It is very top-down in that XML-ish way and most
> commonly it is assumed that each of the FRBR entities will be a
> record.

FRBR in general is relational, but the WEMI classes specifically are
unquestionably hierarchical. I would agree that XML Schemas warps our
thinking, but WEMI is starting to make sense to me as a hierarchy. My
complaint now is the lack of meaningful WEMI subclasses that could make
the model much easier to understand and deal with.

> I say that latter because of the fact that the WEMI entities,
> while having inter-dependencies, also have specific relationships to
> other WEMI entities (as well as to the group 2 and 3 entities). So an
> expression will have a relationship to a work and to one or more
> manifestations -- that's what I think of as a *structural*
> relationship -- 

I agree with this interpretation and provide these RDF examples for
illustration.
(Beware: my "frbr" namespace elements are ad hoc.)

<expression-1> a frbr:Expression ;
	frbr:isARealizationOf <work-1> ;
	frbr:isEmbodiedIn <manifestation-1> ;
	frbr:isEmbodiedIn <manifestation-2> .
<work-1> a frbr:Work .
<manifestation-1> a frbr:Manifestation .
<manifestation-2> a frbr:Manifestation .

> but it can also have bibliographic relationships to
> other expressions (like: one expression is the translation of another
> expression, or is an updated edition).

Here's what the additional triples would look like:

<expression-1>
	frbr:hasATranslation <expression-2> ;
	frbr:hasARevision <expression-3> .
<expression-2> a frbr:Expression .
<expression-3> a frbr:Expression .

> The fact is that it will be very hard to have an expression without a
> work because of the way the properties are spread across the Group 1
> entities: an expression does not have relationship to a primary
> creator (e.g. author), only a work does. Ditto subjects: only Work
> entities have the "has subject" property that links to topical
> entities. 

I'm willing to go so far as believing it is *impossible* to have an
Expression without a Work because *all* conceivable Expressions have
creator and subject relationships in theory: even the fictional ones. I
think we need to beware that FRBR doesn't strive to be a metadata
exchange format, it strives to be a model of common sense reality (more
or less).

> A Manifestation doesn't have a language of text; that
> belongs to the Expression. The necessary elements to describe a
> resource

Riddle: When is a resource not a resource?
Answer: When the modeler(s) declare it to be a property or set of
properties instead.

Fortunately, no modeler in history ever had the last word. :-)

> are spread across the 3 (WEM) group 1 entities, making it
> very difficult to treat them separately. To give you an idea of what
> each entity "means", here are some key attributes for each:
> 
> Work
>   - work title
>   - key for a musical work
>   - coordinates for a cartographic work
>   - with relationships to
>      -- creator of the work
>      -- topics of the work (subject headings and classifications)

The terms "musical work", "cartographic work", and various other
rationalized "foo work" qualifiers imply subclasses of FRBR Work. I
think it's worth attempting.

> 
> Expression
>   - language of the expression (if text)
>   - form of the expression (text, sound, image)

Likewise, "text expression", "sound expression", "image expression", and
other qualifications all imply subclasses of FRBR Expression.

> Manifestation
>   - title of the manifestation (may be different to the work title)
>   - edition
>   - publisher, date of publication
>   - physical format (size, units, other measurements)
>   - ISBN, ISSN, etc.

My feeling is that some of these "attributes" (owl:DatatypeProperty)
SHOULD be modeled as relationships/associations instead
(owl:ObjectProperty). For example, I think "publishers" should be
modeled as a frbr:CorporateBody (or a subclass thereof) and "place of
publication" should be modeled as frbr:Place. Limiting the individuals
in the CorporateBody and Place classes to known subjects of a Work
doesn't make sense in an open world model. Most real world objects can
be dumbed-down to literals when necessary.

> 
> There are many more attributes, but these are the common ones and the
> ones that I think may help people understand the issue. The data
> record that libraries create today contains data elements from all of
> these entities, mixed together and usually not clearly identified as W
> or E or M. To create library data under FRBR it will be necessary to
> ALWAYS have Work+Expression+Manifestation entities. (I'm skipping Item
> in the interest of brevity, but we should assume that it is part of
> the picture.)

For better or worse it's not that simple. As Tom Baker pointed out in
another thread, ontologies aren't exchange formats, they are models in
which some entities can be inferred.

> 
> Now, it would be great to investigate the inferences that one can make
> with FRBR. For example, if you say:
> 
> resourceA / frbrer:hasSubject /
> http://id.loc.gov/authorities/sh85148177
> 
> then the inference is that resourceA is a Work. (I believe the way to
> say this is that "hasSubject" has the domain "Work". Right, Gordon?)

FRBRer coins separate "has as subject" properties for each range class,
but as you would expect the domain is always Work.

> You cannot then say:
> 
> resourceA / frbrer:hasPublisher / "Random House"
> 
> because *that* statement would mean that resourceA is a Manifestation,
> and Manifestation and Work are disjoint.

The FRBRer OWL doesn't currently declare Work and Expression to be
owl:disjointWith one another, but I think that was Gordon's plan. Here's
some support for your understanding:
http://www.w3.org/TR/owl2-primer/#Class_Disjointness.

> So in a sense you are forced
> (whether OWL forces you or not is another question), but the FRBR
> logic forces you to create a new entity for the Manifestation
> *portion* of your description. In addition, to connect the
> Manifestation to the Work (since you need the creator and subjects to
> complete your description), you may need to create an entity for the
> Expression. (RDA allows Manifestations to "Manifest" Works, but I
> think FRBR in its present state still requires M -> E -> W.)

I believe it's possible to create an inferred shortcut like this in OWL,
but it's just a convenience property.

> 
> This is, of course, unless I have totally missed something in the
> nature of FRBR, and if so I would love to hear that my worst fears
> about it do not come to bear.

I think you've created a useful and accurate summary. :-)

Jeff

> 
> kc
> 
> >
> > It relates to Dan's point that schema designers in the new
> > idiom are not actually issuing "shipping orders" for data
> > integrity in the imperative style to which they are accustomed
> > -- even if, as I suspect, they may sometimes _believe_ that
> > this is is the effect of declarations such as the above.
> >
> > As Jeff has pointed out, one might conceivably use the OWL to
> > construct syntactic validators to impose such data integrity,
> > but these are necessarily over and above whatever the OWL
> > itself actually says.
> >
> > Tom
> >
> >
> >
> 
> 
> 
> --
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
> 
>
Received on Sunday, 6 March 2011 09:26:06 UTC