Re: Audiobook proposal - discussion

On Tue, Jul 16, 2013 at 10:54 AM, Karen Coyle <kcoyle@kcoyle.net> wrote:
> Dan, briefly before meeting: content size sounds great. I missed that. Let's
> not worry about CreativeWork extent just yet.

Karen, thank you for pointing me back to the thread at
http://lists.w3.org/Archives/Public/public-schemabibex/2013Feb/0164.html.
My apologies to all, again, for not being able to actually be active
in the group during the Spring and therefore wasting some of your time
while I'm catching up & covering familiar turf, but so it goes.

I'm generally in agreement with the position that was expressed at
various points through the thread that "if you can state True or False
for a Boolean property, great; otherwise offer up a Text value and
schema.org processors will do the best they can with it". That would
lead to a possible recommendation for mark up practices like the
following (assuming that
http://www.w3.org/2011/webschema/track/issues/14 takes the logical
step of aligning itself with RDF):

====
For abridged works, if you know the true/false value, you can set the
"abridged" Boolean property:

<div>Edition: <meta property="abridged" content="true" />Abridged</div>

Conversely, if you only have MARC21 data to work with but you do have
a 250 $a "edition statement", map that to the "bookEdition" Text
property:

<div>Edition: <span property="bookEdition"">Abridged</span></div>

The "abridged" nature of a work may be reflected in its title:

<div>Title: <span property="name">Theodicy, abridged</span></div>

Or general notes (such as MARC21 500 fields), which you can surface as
"description" properties:

<div>Description: <span property="description">Revised and abridged</span></div>
===

I _think_ people could work with that. (Add in ONIX and other
equivalents, of course!)

To get some data, I crunched through the 2.5 million MARC records in
our university consortium library system to find out where the string
"abridged" lives, and it matches your findings back in February that
it typically appears in the "general notes" section.

 tag | subfield | count
-----+----------+-------
 500 | a        |  2190
 245 | c        |   580
 245 | b        |   492
 250 | a        |   437
 245 | a        |   306
 510 | a        |   193
 505 | a        |    28
 246 | a        |    27
 520 | a        |    20
 509 | a        |    15
 503 | a        |    15
 250 | b        |    15
 740 | a        |    14

For the 250 $a, arguably the canonical place to record whether an
edition is abridged or not, the results in our database were woefully
inconsistent:

 abridged ed  |   129
 abridged version |    26
 unabridged |    20
 [abridged ed |    14
 abridged |     9
 unabridged ed |     8
 abridged edition |     8
 [unaltered and unabridged ed |     6
 complete and unabridged |     5
 rev. and abridged ed |     5

(Yes, this is just one data point, from a primarily academic library
system, so the data could all be skewed... I accept that!)

For a few minutes, I had some hopes that the ISTC code, if encoded in
the 024, would enable you to resolve some additional metadata (the
"abridgedness" of a given work is explicitly encoded by the ISTC), but
either the ISTC search engine appears to be defunct, or the example
ISTC codes in the LoC MARC21 docs are invalid.

So for MARC21-based library systems, I think we can pretty much lower
our expectations. Most systems will not be able to definitively set a
schema.org property for "abridged". They could use a hack like
checking for the existence of the string "abridged" in a handful of
fields, but it would very clearly be a hack.

That said, given that ONIX has very clear encoding for the Abridged
property, let's roll with it. There's no reason to hobble the
usefulness of the schema.org Audiobook class just because one subset
of the bibliographic world hasn't figured out how to reliably
represent some useful metadata! I noted with interest that the OCLC
mapping of ONIX 3.0 to MARC21 available from
http://www.editeur.org/96/ONIX-and-MARC21/ just bails on most of the
EditionType mappings, including Abridged / Unabridged; that acts as
confirmation for me (and looks like a really valuable resource for
future efforts).

I'm somewhat tempted to say, "hey, let's map the ONIX 3.0 edition
types in our schemabibex.org extension vocabulary as a more strongly
typed Book.bookEdition property; that is, if the value is set to (say)
http://schemabibex.org/editionType/ABR then we know authoritatively
that it is abridged, otherwise we'll fall back to the base schema.org
behaviour of taking whatever Text value we get". (This also makes me
think that _somebody_ must have already published a vocabulary based
on ONIX?) The "let's map ONIX edition types to a schemabibex.org
extension (or an existing ONIX vocab)" would also offer a way forward
for expressing all of the other values that ONIX has defined.

My other temptation is to suggest "hey, Abridgement and Festschrift
show up as http://www.productontology.org/id/Abridgement and
http://www.productontology.org/id/Festschrift, we _could_ tell people
to use additionalType="http://www.productontology.org/id/Abridgement"
if they have a way of knowing that they are offering an abridged
version, otherwise fall back to the Book.bookEdition field for (in the
MARC21 world) the contents of the 250 field, or fall even further back
to the general Thing.description fields for (in the MARC world) 500
fields. The advantage of the productontology approach is that it is
already supported by schema.org, and it hits a subset of the
interesting ONIX edition types (however, "Teacher's edition" doesn't
show up on a quick search, for example). And given that
http://www.productontology.org/id/Audiobook exists, we could use that
as an additionalType too.

Yielding to either of these temptations would avoid having to define a
special property just for "abridged", and would leave just "readBy" as
a new attribute. So I'm therefore tempted by both of these
approaches...

(On "readBy": wouldn't it be nice if "contributor" pointed at a new
class, "Contributor", that would derive from Person but explicitly
capture the nature of the contribution to this particular work? This
would seem to be applicable not just to readBy, but any of the other
long list of credits that you can imagine scrolling past you for
minutes at the end of a movie as you wait patiently hoping for a last
bonus blast of content for those who stuck it out to the very end).

Received on Wednesday, 17 July 2013 17:48:03 UTC