- From: Karen Coyle <kcoyle@kcoyle.net>
- Date: Sat, 29 Jun 2013 13:03:00 -0700
- To: public-schemabibex@w3.org
- CC: "Godby,Jean" <godby@oclc.org>
I just did a blog post responding to one of the suggestions for fitting
FRBR-like entities into CreativeWork:
http://kcoyle.blogspot.com/2013/06/frbr-and-schemaorg.html
Short answer: I don't think it works on a practical level. Longer
answer: I think that any division of bibliographic description into FRBR
or BIBFRAME entities is going to be problematic.
*** text below for those who prefer it all in one place ***
*** but without links ***
The FRBR structure for what it calls the Group 1 entities (Work,
Expression, Manifestation, and Item, hereafter written as WEMI) presents
quite a few problems for data modeling. Of the many issues this brings
up, there is the fact that this division is not universally recognized,
not even in library data, and definitely is not recognized outside of
libraries. This has particular impact for library data as part of the
linked data space, where a primary goal is interlinking with data from
diverse resources. It is unlikely that online bookstores or academic
citations will begin to use the WEMI structure.
One area where library bibliographic data and bibliographic data from
other sources may mingle is in schema.org markup in web pages. Schema
already has a basic class that can be used for bibliographic data,
called "CreativeWork." Creative work contains the common elements for
this type of description, like author, title, publisher, pages, subject,
etc. Problems arise, therefore, when trying to express either WEMI or
the simplified BIBFRAME Work and Instance (hereafter bf:Work,
bf:Instance) in this model. CreativeWork is a unified model that
includes all descriptive elements in a single set; BIBFRAME separates
those elements into two entities, and each entity contains only a
defined set of the descriptive elements. Thus, where CreativeWork will
have information for author, title, publisher, pages, subject, in
BIBFRAME author and subject must be described in the bf:Work entity, and
title, publisher and pages in the bf:Instance entity. Between MARC,
FRBR, BIBFRAME, and schema.org, a full bibliographic description may
require one, two, or four separate entities.
The OCLC report on BIBFRAME and schema.org proposes that one could use
CreativeWork for different FRBR (or presumably BIBFRAME) entities,
making the determination based on what fields are present:
"In this scheme, it would be possible to say that when only titles,
subjects, and creators are mentioned, the description for a
Schema:CreativeWork refers to a FRBR Work; and when copyright dates and
genres are present, the description is equivalent to a FRBR Expression."
(p. 14)
While that makes sense from a pure logic point of view, and would
probably work in a library database, it has problems within the web and
linked data contexts of schema.org. I should note, before going on, that
schema.org is metadata markup for any web site, and CreativeWork will be
used for books, films, music, art, and other forms of creation by anyone
and everyone on the web. This is not a library-specific standard.
First, there are many sites that have a search response page with
limited information about the item, requiring the user to click through
for details. A search results page for books on Amazon or Ebay gives
only the author and title, but does not represent the Work -- it merely
doesn't give the user the full data on that page in order to fit more
results onto the page. Therefore, the lack of information on one web
page does not mean that the description there is complete.
Second, there is no "record" in schema.org, merely a number of coded
statements with values within a web page. Any web page can contain
information about any number of "things" and information about those
things may be placed anywhere on the page, possibly far from each other
and not coded as a single unit. It may not be possible to know how
complete a description is.
Third, web site owners can opt to mark up only part of their data. In
schema.org markup that I have encountered on commercial sites, markup
reflects the owner's view. For example, Google (one of the originators
of schema.org) does not mark up the bibliographic data in its Books
pages, but instead emphasizes user ratings, images, and subjects. (This
shows the markup using the Google rich snippet testing tool.) In
comparison, the extracted schema.org elements for an IMDB page is much
more detailed, an indication that it considers itself an information
site more than a sales site.
Finally, although this is somewhat beyond schema.org, should the data in
web pages be incorporated into the linked data space, it will go there
as individual triples that are part of a huge graph of data. That graph
is theoretically limitless and makes use of a principle called the "open
world assumption." In an open world it is not possible to base your
assumptions on what is missing from the graph. The open world does not
have a concept of completeness because there is always the possibility
that there is more information than what you are seeing at any given
moment in time.
These may not be the only arguments against the use of CreativeWork for
different FRBR or BIBFRAME entities, but in my mind they are sufficient
to make the case that if it is desirable to encode FRBR or BIBFRAME
entities in schema.org that they must be represented by different
schema.org classes and cannot be inferred from data elements in
CreativeWork.
Before I end, let me make clear that I do not favor an imposition of
FRBR-like separations of bibliographic data on the linked data world.
Even the BIBFRAME two-part bibliographic description will have problems
interacting with the one-entity model that is used outside of libraries.
I do think that we can find a way to talk virtually about works without
stripping such key elements as authors and subjects from the description
of the package that carries the content. That package is, after all,
what I hold in my hand when I read something, and it is a whole, with
author, title, subjects, pages, binding, publisher, etc. That is,
however, a topic for another post.
--
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Saturday, 29 June 2013 20:03:30 UTC