Re: Provenance in RDF

[freed from spam filter -rrs]

Date: Thu, 7 Mar 2002 18:20:46 -0500 (EST)
From: martin <martin@ics.forth.gr>
Message-ID: <3C87F582.9ABF65AF@ics.forth.gr>
To: Dave Reynolds <der@HPLB.HPL.HP.COM>
CC: Carl Lagoze <lagoze@cs.cornell.edu>,
        "RDF Interest (E-mail)" <www-rdf-interest@w3.org>,
        "Jane Hunter (E-mail)" <jane@dstc.edu.au>,
        "Martin Doerr (E-mail)" <martin@csi.forth.gr>, crm-sig@ics.forth.gr

Dear Dave,

In the CIDOC CRM we model the source of an information via a document, which
may document any entity. This however only to describe the "historical
document", i.e. cases
when the source of information is manifestated in an identifiable document.
Provenance of a museum object should be analysed into one of the following:
1.) Place of creation
2) Place of use
3) Place of finding.

For museums we recommend not to mess up those three, and to model the
appropriate
case explicitly with CIDOC CRM event constructs.

We regard the internal scientific dialogue about data or metadata , i.e.
who contributes to
the knowledge of a museum or organisation with which bit, who contradicts,
confirms etc.
as a problem that pertains equally to all parts of a metadata format, and
can be solved in
multiple ways without interfering with the rest of the model.
Therefore we decided not to take this problem into the scope of the CIDOC
CRM.

One solution is to use RDF reification. I was frustrated to hear, that
reification may be
withdrawn from RDF.
In the sense of Toulmin's microarguments, one can analyse scholarly claims
"atomically" into claims
about relationsships and claims about the existence of class instances. The
first could be
implemented by properties of properties, if there are many parties in the
dialogue, or
by a set of superproprties, one for each scholar if they are few. The
second can be
implemented by a simple property. See also:

M. Doerr, "Reference Information Acquisition and Coordination", in:
"ASIS'97 -Digital Collections: Implications for Users, Funders, Developers
and Maintainers", Proceedings of the 60th Annual Meeting of the American
Society for Information Sciences, " November 1-6 '97, Washington,
Vol.34.,pp295-312, Information Today Inc.: Medford, New Jersey, 1997. ISBN
1-57387-048-X.
(http://www.ics.forth.gr/isl , publications).

If you have enough control over your implementation, special indices on
relations and individuals
could be devised.

If a document structure is preserved, a connection of an editor with a
version is most efficient.

In any case, there is a difficult balance between granularity,
functionality and cost.

best regards,

Martin


Dave Reynolds wrote:

> Carl,
>
> Thanks very much for these pointers. I was aware of the merged Indecs/DC/DOI
> model but not of CIDOC and hadn't followed the more recent ABC work.
>
> I like the "extensional actuality" approach that some facets of entities are
> only valid in a given situation and then explicitly modeling those
situations
> (and the actors and events involved). As you mention in the paper the
> cost/benefit tradeoffs are complicated - richer metadata may mean less
metadata
> in a zero sum game. For our situation we are concerned about lightweight
> mechanisms for individual users managing semi-structured data rather than
> institutions creating long lived metadata resources so we should err
towards the
> simple end of the spectrum.
>
> I can certainly see the value of the ABC approach for directly modeling the
> relations between different manifestations of concepts. That does come up
in our
> application domain (e.g. different versions and renditions of a given
conference
> paper) and I can see us using it there.
>
> I'm less clear on using it for low level provenance information. We have a
> situation where users will individually attach metadata such as
classifications,
> ratings, comments to items but that metadata can be aggregated across a
> community. A full application of the ABC model would mean modeling the
"speech
> act" of a user attaching that metadata as an explicit event leading to an
> "is_annotated" situation. When the act involved is a rich one such as a
formal
> classification of museum piece or a run of a complex software validation
suite
> that feels appropriate because rich event information will need to be
attached.
> When it is a very informal one such as a user just dropping an item into an
> appropriate classification bucket the explicit event/situation model seems
> overkill. But that's just an early reaction - will have to think about it
more.
>
> Thanks for your help.
> Dave
>
> Carl Lagoze wrote:
> >
> > Dave,
> >
> > You might consider how this problem relates to a familiar problem to
the museum community.  There, for example, an physical artifact is
"discovered" and then over time it may be classified, re-classified,
controversially classified, etc.  In effect, there is a constant artifact
with different sets of properties associated it by different parties over
time intervals (in fact, some of the property assertions have fixed time
context (e.g., "Mohammad Ali" had the name property "Cassius Clay" before
1966).
> >
> > A number of us, Jane Hunter, Martin Doerr, etc., have been working on
how to cleanly model the mixed notions of objects changing over time and
attribution/characterization of objects over time - Martin coming at this
with his more museum oriented perspective and Jane and I coming at this
from a more digital library/resource perspective.
> >
> > Martin's very complete work is described at http://cidoc.ics.forth.gr/.
 Jane and I have written thoughts on this up in the context of our abc
modeling work - see our paper in JODI at
http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Lagoze/.
> >
> > The group of is in involved in a DELOS workshop series try to come up
with canonicalized thinking about all this.
> >
> > Carl
> >
> > Department of Computer Science
> > Cornell University
> > Ithaca, NY 14853 USA
> > Voice: +1-607-255-6046
> > FAX: +1-607-255-4428
> > EMail: lagoze@cs.cornell.edu
> > WWW: http://www.cs.cornell.edu/lagoze
> >
> > > -----Original Message-----
> > > From: Dave Reynolds [mailto:der@hplb.hpl.hp.com]
> > > Sent: Wednesday, February 27, 2002 7:14 AM
> > > To: RDF Interest (E-mail)
> > > Subject: Provenance in RDF
> > >
> > >
> > > We are working on a semantic web related application that
> > > needs some provenance
> > > support. We have various routes for doing this but would be
> > > interested in
> > > hearing of other's experiences. Are there any groups out
> > > there that have
> > > developed applications supporting provenance within RDF that
> > > would be willing to
> > > share their experiences on what worked well or badly?
> > >
> > > To explain a little.
> > >
> > > We are developing a semantic web application for shared
> > > information management.
> > > In this application users are able to attach personal
> > > metadata to items and are
> > > able to view the "soup" of metadata created by many users.
> > > For example the same
> > > item might have many different dc:title fields created by
> > > different users and
> > > the UI should be able to view this data and give response
> > > like 'most users call
> > > this "foo" but one user prefers to call it "bar"'. To support
> > > these we want fine
> > > grain tracking of where the multiple metadata values came
> > > from, down to the
> > > level of individual RDF assertions. The tracking data could
> > > include items like
> > > creator, date and digital-signature, these terms would be
> > > defined in a separate
> > > provenance schema/ontology.
> > >
> > > We are exploring three approaches to doing this - application
> > > level, reification
> > > and out-of-band. Each of these has pros and cons.
> > >
> > > ** Application level
> > > Treat provenance as a data modeling problem at the
> > > application level and
> > > introduce bNodes to which the provenance can be attached.
> > > Thus instead of:
> > >    subj --pred--> obj
> > > for any provenanced (is that a word? :-) values use:
> > >    subj --pred--> <> --rdf:value--> obj
> > >                      --pv:creator--> "Dave"
> > >                      --pv:date--> "27/2/02"
> > > This has the advantage of flexibility and means we can query
> > > provenance data
> > > conveniently using existing RDF query languages (RDQL in our
> > > case). However, as
> > > far as we know this is not a standard idiom and that might
> > > make it harder to
> > > interoperate with other RDF metadata sources.
> > >
> > > ** Reification
> > > Clearly the official RDF mechanism for representing
> > > provenance is to use
> > > reification and attach the same "pv:*" assertions to a node
> > > denoting the reified
> > > statement.
> > > This has the advantage of being the standard idiom at
> > > present, however the
> > > uncertain status of reification with the RDFCore WG leaves us
> > > nervous. We can
> > > still query provenance data, though the query would now look
> > > rather more ugly
> > > and verbose than if we take the application level approach.
> > > The shear number of
> > > triples needed is high but (a) is too early to optimize for
> > > performance and (b)
> > > we can in any case hide overhead by implementing a triple
> > > store which pretends
> > > to reify but in fact uses a more compact representation.
> > >
> > > ** Out of band
> > > In this option we simply make provenance support a property
> > > of the API. We don't
> > > change the RDF assertions in the main fact base at all.
> > > Instead we provide API
> > > calls to attach and retrieve annotations from any RDF
> > > assertion. This is related
> > > to the "quad" notion discussed on this list some time ago and
> > > the N3 approach
> > > that evey statement has an internal context attribute. This
> > > has the advantage
> > > that it hides the mechanics of provenance allowing us to keep
> > > the application
> > > code stable even if the implementation idiom changes. It has
> > > the disadvantage
> > > that we'd need to extend our query support to access this
> > > additional API layer
> > > and is at best unhelpful for integrating with other RDF data sources.
> > >
> > > For our current purposes we will simply pick one and work
> > > with it but if anyone
> > > else has already trodden this path and has experiences to
> > > share then we'd love
> > > to hear from them.
> > >
> > > Dave

--

--------------------------------------------------------------
Dr. Martin Doerr              |  Vox:+30(810)391625         |
Principle Researcher          |  Fax:+30(810)391609         |
Project Leader SIS            |  Email: martin@ics.forth.gr |
                                                             |
               Information Systems Laboratory                |
                Institute of Computer Science                |
   Foundation for Research and Technology - Hellas (FORTH)   |
                                                             |
Vassilika Vouton,P.O.Box1385,GR71110 Heraklion,Crete,Greece |
                                                             |
         Web-site: http://www.ics.forth.gr/isl               |
--------------------------------------------------------------

Received on Friday, 8 March 2002 07:40:40 UTC