Re: QUERY, MISC: Definition of Provenance?

Hi Leo,

I dont have a good *definition* to hand, but I can tell you how some
people have been using provenance in our RDF implementations.
In the RDF world I think the term is used to to represent some
indication of the source of information found on the web. So if
you had an RDF database that collected information from many different
sources, it is often useful to be able to keep track of the source so
that you can delete all information from the source, or update it as
necessary.

Currently many databases which store RDF have some form of provenance
storage *outside* of the RDF model. So the fact that information came
from a particular source is not represented as RDF triples but as an
extra field in the database. I think you could represent this kind of
information within the reification model, but preserving information
about the source of triples outside the RDF model is useful for database
management rather than getting into issues of whether such statements
are asserted or not, or whether a join on such data is a join on the
reified versions of the data or on the asserted versions. (As an aside,
it would be useful to treat reified data as asserted sometimes, while
also retaining the information that it is reified...)

So an identifier for the source could have all sorts of information
associated with it, including information about when it was found or
updated, and RDF properties indicating whether the information is
signed, for example. I'm not totally happy with moving in and out of the
RDF model like this, especially since many RDF APIs and dataabses don't
yet support provenance information, but being able to trace where each
triple came from, and when, and associate other information with the
triple is necessary, I think.

Its worth asking this question on the RDF interest group list: I'm sure
there are many different views and implementations.

Libby


On Sat, 2 Mar 2002, Leo Obrst wrote:

> Excuse my ignorance: can someone give me a definition of "Provenance"?
> I've seen it in the RDF discussions and recently in WOW-G/OWL.
>
> The definition (as far as I can determine) is something along the lines
> of (from http://www.mtholyoke.edu/offices/library/arch/def.htm):
>
> "Provenance:
>
> The place of origin of an object or document(s). In archival terms, this
> refers to the administrative office of origin of a given record, group
> of records, or files. In the case of manuscript collections, provenance
> refers to the person, family, firm or other source from which the
> materials were obtained. Provenance can also refer to information about
> the successive transfers of ownership and custody of a particular, book,
> object, or document."
>
> This smacks to me of the legal and museum worlds (not that there's
> anything wrong with that).
>
> Do they really mean by this what I know as "product metadata" or "data
> lineage" from the database world? Or do they mean ontological/semantic
> properties associated with an instance/individual that change over time?
> Big difference.
>
> Is this the same as "claims", per, e.g., SHOE? It does always seem to be
> related to indirect discourse or modal issues, as far as I can tell:
> "John believed <blah>." So, reification?
>
> I am sorry for my confusion; perhaps this is common parlance in the
> Semantic Web?
>
> And, by the way, what is the model semantics of this notion of
> "provenance"? Metadata tag or time-dependent property?
>
> Leo
>
> --
> _____________________________________________
> Dr. Leo Obrst		The MITRE Corporation
> mailto:lobrst@mitre.org Intelligent Information Management/Exploitation
> Voice: 703-883-6770	7515 Colshire Drive, M/S W640
> Fax: 703-883-1379       McLean, VA 22102-7508, USA
>
>

Received on Sunday, 3 March 2002 07:38:37 UTC