Re: Provenance (was RE: REX evaluation) from Jim Hendler on 2004-06-10 (public-rdf-dawg@w3.org from April to June 2004)

From: Jim Hendler <hendler@cs.umd.edu>
Date: Wed, 9 Jun 2004 21:54:54 -0400
To: "Rob Shearer" <Rob.Shearer@networkinference.com>, <kendall@monkeyfist.com>
Cc: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Message-Id: <p06110498bced6e2ca301@[10.0.1.2]>

At 18:40 -0700 6/9/04, Rob Shearer wrote:
>>  > This is because "provenance" isn't well-defined for me.
>>
>>  Yes; as I said, it's very overloaded. But, contra what you say below,
>>  there are lots of core use cases where *who* made assertion A is as
>>  important as the content of assertion A. When they made it and in what
>>  context -- say, in which web resource identified by a URI -- are
>>  equally crucial. Many of our use cases involving the Intelligence
>>  Community are *very* provenance-centric apps.
>>
>>  I don't know how or whether to really build support for this use case
>>  into the design of the query language and/or protocol; I do know,
>>  however, that this is a *common* use case (consider, for example,
>>  writing an RDF spider where the original source of the assertion is
>>  vital...)
>>
>>  > I see an RDF
>>  > repository as an RDF repository; I thought the whole point
>>  of RDF was
>>  > that it made absolutely no difference who said something or
>>  where that
>>  > information was stored so long as somebody said it somewhere,
>>
>>  Hmm, no, with all due respect, Rob, I think that's wrong. :>
>
>I think we're using "RDF" to refer to different things here.
>I recognize that provenance can be very useful in many RDF applications,
>but I don't see provenence within the RDF spec. If it is actually
>realized as triples (and there are ways to do that with provenance
>information), then it's RDF, but if it's meta-information sitting
>outside the RDF data model then I think it's somewhat out of scope.
>
>I'm glad we've finally gotten down the list to talk about some of these
>objectives that have never really been addressed.


FWIW, many RDF triple stores actually create "quads" rather than 
triples -- the reason is that since each triple comes from somewhere 
(usually a document), one can keep and manage a "provenance" URI very 
easily. RDFlib does this, as does Kowari and I think a number of 
others (those are the two we use).  We make heavy use of this, and it 
seem foolish if the query language didn't give access to these.  On 
the other hand, other triple stores don't have these context URIs, 
and thus mandating them would be equally foolish.  I can think of a 
couple of pretty easy designs depending on whether we expect most 
stores to have these fields (in which case make it a recognized part 
of the query results where those stores that don't have it return 
*Unknown*) orif we want to make it an optional featurewe can simply 
have this be something optionally in a query that can be ignored by 
stores that don't provide contexts)

Seems to me that ignoring a feature that is already implemented in a 
lot of triple stores might not be the best way to go -- especially as 
we, and at least 5-6 other groups I know, make heavy use of these 
context fields and would hate not to be able to share them
  -JH

-- 
Professor James Hendler			  http://www.cs.umd.edu/users/hendler
Director, Semantic Web and Agent Technologies	  301-405-2696
Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742	  240-277-3388 (Cell)

Received on Wednesday, 9 June 2004 21:55:02 UTC