FRBR and CreativeWork

I just did a blog post responding to one of the suggestions for fitting 
FRBR-like entities into CreativeWork:

http://kcoyle.blogspot.com/2013/06/frbr-and-schemaorg.html

Short answer: I don't think it works on a practical level. Longer 
answer: I think that any division of bibliographic description into FRBR 
or BIBFRAME entities is going to be problematic.

*** text below for those who prefer it all in one place ***
*** but without links ***

The FRBR structure for what it calls the Group 1 entities (Work, 
Expression, Manifestation, and Item, hereafter written as WEMI) presents 
quite a few problems for data modeling. Of the many issues this brings 
up, there is the fact that this division is not universally recognized, 
not even in library data, and definitely is not recognized outside of 
libraries. This has particular impact for library data as part of the 
linked data space, where a primary goal is interlinking with data from 
diverse resources. It is unlikely that online bookstores or academic 
citations will begin to use the WEMI structure.

One area where library bibliographic data and bibliographic data from 
other sources may mingle is in schema.org markup in web pages. Schema 
already has a basic class that can be used for bibliographic data, 
called "CreativeWork." Creative work contains the common elements for 
this type of description, like author, title, publisher, pages, subject, 
etc. Problems arise, therefore, when trying to express either WEMI or 
the simplified BIBFRAME Work and Instance (hereafter bf:Work, 
bf:Instance) in this model. CreativeWork is a unified model that 
includes all descriptive elements in a single set; BIBFRAME separates 
those elements into two entities, and each entity contains only a 
defined set of the descriptive elements. Thus, where CreativeWork will 
have information for author, title, publisher, pages, subject, in 
BIBFRAME author and subject must be described in the bf:Work entity, and 
title, publisher and pages in the bf:Instance entity. Between MARC, 
FRBR, BIBFRAME, and schema.org, a full bibliographic description may 
require one, two, or four separate entities.

The OCLC report on BIBFRAME and schema.org proposes that one could use 
CreativeWork for different FRBR (or presumably BIBFRAME) entities, 
making the determination based on what fields are present:

     "In this scheme, it would be possible to say that when only titles, 
subjects, and creators are mentioned, the description for a 
Schema:CreativeWork refers to a FRBR Work; and when copyright dates and 
genres are present, the description is equivalent to a FRBR Expression." 
(p. 14)

While that makes sense from a pure logic point of view, and would 
probably work in a library database, it has problems within the web and 
linked data contexts of schema.org. I should note, before going on, that 
schema.org is metadata markup for any web site, and CreativeWork will be 
used for books, films, music, art, and other forms of creation by anyone 
and everyone on the web. This is not a library-specific standard.

First, there are many sites that have a search response page with 
limited information about the item, requiring the user to click through 
for details. A search results page for books on Amazon or Ebay gives 
only the author and title, but does not represent the Work -- it merely 
doesn't give the user the full data on that page in order to fit more 
results onto the page. Therefore, the lack of information on one web 
page does not mean that the description there is complete.

Second, there is no "record" in schema.org, merely a number of coded 
statements with values within a web page. Any web page can contain 
information about any number of "things" and information about those 
things may be placed anywhere on the page, possibly far from each other 
and not coded as a single unit. It may not be possible to know how 
complete a description is.

Third, web site owners can opt to mark up only part of their data. In 
schema.org markup that I have encountered on commercial sites, markup 
reflects the owner's view. For example, Google (one of the originators 
of schema.org) does not mark up the bibliographic data in its Books 
pages, but instead emphasizes user ratings, images, and subjects. (This 
shows the markup using the Google rich snippet testing tool.) In 
comparison, the extracted schema.org elements for an IMDB page is much 
more detailed, an indication that it considers itself an information 
site more than a sales site.

Finally, although this is somewhat beyond schema.org, should the data in 
web pages be incorporated into the linked data space, it will go there 
as individual triples that are part of a huge graph of data. That graph 
is theoretically limitless and makes use of a principle called the "open 
world assumption." In an open world it is not possible to base your 
assumptions on what is missing from the graph. The open world does not 
have a concept of completeness because there is always the possibility 
that there is more information than what you are seeing at any given 
moment in time.

These may not be the only arguments against the use of CreativeWork for 
different FRBR or BIBFRAME entities, but in my mind they are sufficient 
to make the case that if it is desirable to encode FRBR or BIBFRAME 
entities in schema.org that they must be represented by different 
schema.org classes and cannot be inferred from data elements in 
CreativeWork.

Before I end, let me make clear that I do not favor an imposition of 
FRBR-like separations of bibliographic data on the linked data world. 
Even the BIBFRAME two-part bibliographic description will have problems 
interacting with the one-entity model that is used outside of libraries. 
I do think that we can find a way to talk virtually about works without 
stripping such key elements as authors and subjects from the description 
of the package that carries the content. That package is, after all, 
what I hold in my hand when I read something, and it is a whole, with 
author, title, subjects, pages, binding, publisher, etc. That is, 
however, a topic for another post.

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Received on Saturday, 29 June 2013 20:03:30 UTC