Using dc:isFormatOf to unify instanceOf and commonEndeavor with existing practise

Hi all,

I'm sure we can find a way of reconciling our current, varied, positions
regarding instanceOf and commonEndeavour. The differences seem to relate to
various MARC and FRBR experiences, or at least reactions to the different
rigors, or indeed lack thereof, that have come out of these. I also think
we can avoid inventing too much new thinking on these matters, since there
is lots to reuse from the domain of resource descriptions.

Model-wise, I think many of us (perhaps all) agree that the restrictive
abstraction principles of WEMI separation, and the simpler but still
divided Work/Instance counterpart evolving in BIBFRAME, are too rigid for
schema.org (and possibly for cataloguing in general). However, many also
see a great benefit in relating to common generalized notions of works from
their specific forms, at least when a work is available in a multitude of
formats. This notion is very useful both for facilitating cataloguers'
workflow and for building services upon the data. As I mentioned a while
ago [1], this is not so much about *abstraction* but *generalization* (also
elaborated on in [2]). So let us neither prescribe nor prohibit.

We should carefully consider what has been done in the wild, and especially
practices stemming from, but not limited to, libraries. A good example of
this is the Dublin Core Terms vocabulary [3], which has been heavily used
for many years now, in lots of linked data scenarios. It is used in and
recommended by many W3C specs (e..g SKOS, VoID and PROV) and is the base
for many community vocabularies, such as BIBO. Its terms are used in lots,
if not most, of the datasets in the Linked Open Data cloud. If there is any
stable core in the plethora of bibliographic vocabularies, I'd say DC terms
is it. And it gets by with (probably because of) quite a minimal
specification (just like schema.org).

I therefore suggest that we consider 'isFormatOf', explicitly based on
'dcterms:isFormatOf' [4], as a replacement for, or indeed a unification of,
both the 'instanceOf' and 'commonEndeavour' proposals (and possibly
content/carrier).

Dublin Core defines 'isFormatOf' as:

    A related resource that is substantially the same as the described
resource, but in another format.

 The things being in specific formats are representations, manifestations
or instances. This property can be used both to relate between different
formats (similar to 'commonEndeavour'), and for linking from a specific
format to a generalized notion, such as an expression or work (similar to
'instanceOf'). The latter use of 'dcterms:isFormatOf' is quite common,
using the pattern of linking different representations (e.g. in HTML or PDF
for digital representations, or hardcover or paperback for physical books)
to a general resource which they represent. Examples of this can be seen
e.g. in legislation.gov.uk [5]. (For specifying the kind of format,
schema:bookFormat is applicable, as is the more general dcterms:format
property).

As for the actual name, we could include 'isFormatOf' as is (and possibly
its inverse, 'hasFormat'). Or we could relabel it somehow. The name is
important, but only instrumentally so. The most important thing is to find
a common meaning, and to do so we should base it on existing usage.

(I'd also like to note that the solid proposal we do have on the table,
'hasPart'/'isPartOf', correlates very much to the existing Dublin Core
properties of the same name (as has also been discussed). I do think we
should mention that in the wiki page. I can address that unless anyone
objects, following the pattern of the Datasets proposal [6]. In fact, if we
can find a common ground in (at least parts of) the Dublin Core terms, we
can also continue to import some other terms, such as 'isVersionOf',
'references' and 'source', if needed.)

Regarding the necessity of an abstract class, I don't think it is a strict
requirement for this pattern. The notion of variable generalization is
already present in the fact that we don't describe one single item/copy
even at the specific format level. That is, even a "manifestation" has the
extent of a group, and thus we can relate that to a broader group
representing the union of manifestations (i.e. the "expression" level),
without needing to separate the classes. This notion seems very much
present in the Product type as well, where it's up to the user of the
vocabulary to determine the level of specificity for the subject described.
Granted, there are additional specializations in IndividualProduct and
ProductModel, but both derive from the general Product class. Thus, there
is no principal divide. (In fact, if a case was made that there is, that
would seem to be an argument for the applicability of WEMI in schema.org..)

(The choice of which other properties (e.g. author, illustrator, subjects,
publisher) should be specific to a certain format/representation is then up
to the data publisher (cataloguer and/or library system) to determine. You
might do selective copying of properties, or link to a prototypical uniform
work. Consumers/services wishing to index data for each specific format in
their entirety can also copy in the general properties from the general
form. Others may choose to traverse graphs, or even create unions of
related formats altogether into a mixed form described with all properties.)

Cheers,
Niklas

[1]:
http://lists.w3.org/Archives/Public/public-schemabibex/2013Mar/0077.html
[2]: http://grammar.ccc.commnet.edu/grammar/composition/abstract.htm
[3]: http://purl.org/dc/terms/
[4]: http://purl.org/dc/terms/isFormatOf
[5]: http://www.legislation.gov.uk/developer/formats/rdf
[6]: http://www.w3.org/wiki/WebSchemas/Datasets

--
Niklas Lindström
National Library of Sweden (KB)

Received on Tuesday, 2 July 2013 21:55:07 UTC