Re: Provenance for section 3 in technologies.tex from Kevin Smathers on 2003-06-26 (www-rdf-dspace@w3.org from June 2003)

From: Kevin Smathers <kevin.smathers@hp.com>
Date: Thu, 26 Jun 2003 09:41:49 -0700
To: Alberto Reggiori <alberto@asemantics.com>
Cc: www-rdf-dspace@w3.org, staff@asemantics.com
Message-ID: <3EFB224D.8060604@hp.com>
Hi Alberto,

I'm not sure whether what you have described would work or not.  
Conceptually a statement plus a context is a stating, but if the 
statement is general, and the context is also general, then you can't 
actually determine the instance from the context and statement pair.

Consider a graph such as 'new news' that is a linked list of pointers to 
specific news items, where a news item is another subgraph.  If you use 
statements for that subgraph, then confusion among the instances will 
make it impossible to determine which of many paths you should follow to 
expand the subgraph.  So you introduce context and establish the link 
between the context of the statement and the statement using a new arc.  
Now you can expand the subgraph by only following arcs that are in the 
same context as the beginning query; problem solved?

Really the problem is only partially solved.  You now have created the 
programming equivalent of namespaces, and are using those namespaces to 
resolve ambiguity among similar links, but there are two problems.  The 
first is that there can only be one instance per namespace or ambiguity 
returns.  The second problem is that any external arcs pointing at a 
subgraph must link both to the subgraph and to the context of the 
subgraph, so arcs become non-atomic. 

To solve these two problems you would have to create a new context 
instance per statement within an abstract context.  While this is 
feasible, the result would be to have an end system that looked just 
like statings, but used contexts as the unique element rather than statings.

Cheers,
-kls

Alberto Reggiori wrote:

>
>
> On Thursday, June 26, 2003, at 02:11  PM, Butler, Mark wrote:
>
>>
>> Hi Dave
>>
>>> Non-standard extensions would be best avoided if you want
>>> SIMILE to be a full
>>> participant in the semantic web.
>>
>>
>> But to take this back to my original suggestion does this apply to 
>> quads? My
>> understanding from Andy is that they are used by RDFStore and a 
>> number of
>> RSS processors, and from Jeremy that although Jena 2 does not have a 
>> quads
>> API it does actually use a quad data structure "under the hood". So 
>> although
>> they are non-standard at the moment, people are using them, so should we
>> really rule them out?
>
>
> hi Mark,
>
> in our interpretation of provenance/contexts in RDFStore we assumed 
> that a statement represents a fact that is asserted as true in a 
> certain context. This circumstance (e.g. space/temporal, situation or 
> scope) where the statement has been stated represents “contextual” 
> information about the statement [1][2]. For example, when triples are 
> being added to a graph it is often useful to be able to track back 
> where they came from (e.g. Internet source Web site or domain), how 
> they were added, by whom, why, when (e.g. date), when they will expire 
> (e.g. Time-To-Live) and so on. Such context (or provenance 
> information) can be thought of as an additional and orthogonal 
> dimension to the other 3 components. This concept is not part of the 
> current RDF data model [3] and referred to as “statement reification". 
> From the application developer point of view there is a clear need for 
> such primitive constructs to layer different levels of semantics on 
> top of RDF which can not be represented in the RDF triples space. 
> Applications normally need to build meta-levels of abstraction over 
> triples to reduce complexity and provide an incremental and scaleable 
> access to information. For example, if a Web robot is processing and 
> syndicating news coming from various on-line newspapers, there will be 
> overlap. An application may decide to filter the news based not only 
> on a timeline or some other property, but perhaps select sources 
> providing only certain information with unique characteristics. This 
> requires the flagging of triples as belonging to different contexts 
> and then describing in the RDF itself the relationships between the 
> contexts. At query time such information can then be used by the 
> application to define a search scope to filter the results. Another 
> common example of the usage of provenance and contextual information 
> is about digital signing RDF triples to provide a basic level of trust 
> over the Semantic. In that case triples could be flagged for example 
> with a PGP key to uniquely identify the source and its properties. 
> There have been several attempts [4][5][6][7] trying to formalize and 
> use contexts and provenance information in RDF but there is not yet a 
> common agreement how to do it. It is also not completely clear how an 
> application would benefit from this information. Jena2 seems is also 
> trying some steps in that direction too.
> Our approach to model contexts and provenance has been simpler and 
> motivated by real-world RDF applications we have developed [8][9]. We 
> found that an additional dimension to the RDF triple can be useful or 
> even essential. Given that the usage of full-blown RDF reification  
> can be cumbersome due to its verbosity and inefficiency, we developed 
> a different modeling technique that flags or mark a given statement as 
> belonging to one or more specific contexts.
>
> On the practical side, our Perl/C API allows to add/remove and search 
> triples into specific "spaces" or contexts and serialize them back as 
> Quads (simple extension to N-Triples syntax) - at the moment we are 
> about to implement a serialization of context back to RDF/XML (also as 
> Jan suggested) via the rdf:ID reification stuff and at parse time will 
> just flag those triples (predicates) as "special" or asserted in a 
> different context - in the past we used rdf:bagID for to hack this 
> functionality but it has been recently dropped from the specs as you 
> probably noticed. At the RDQL query level we allow a 4-th component as 
> URI (resource) on triple-patterns to specify/select the context - the 
> nice part of it is that sub-sequent triple-patterns can refine and 
> select the vars from that 4-th component to "unify" descriptions of 
> different levels.
>
> As an example, as presented at the WWW2003 devday, we have some demo 
> queries using contexts available
>
> http://demo.asemantics.com/rdfstore/www2003/
>
> The example database contains scraped news from most italian 
> newspapers, where each channel and news item is put into a specific 
> source context - this allows us to filter results by date, by source 
> avoiding overlaps and clashing of URLs (eg. some newspapers recycling 
> the same URL every day but with different HTML content). In particular 
> look at the last two queries (number 9 and 10) using contextual 
> information at the RDQL level - the very last one is pretty cool to 
> me, which allows to describe the 4-th context component with a dc:date 
> and then join it into the other triple space.
>
> BTW: while at www2003 I had a chat with Matt Biddulph about his RSS 
> codepiction code/demo and he seems to have similar problems and 
> solutions using Jena with reification to mimic contextual information 
> - that means that this aspect is going to fundamental for the success 
> of the whole Semantic Web and RDF systems to me
>
> but yes, all this is not "standard" :-)
>
> hope this helps
>
> all the best
>
> Alberto
>
> [1] Graham Klyne, 13-Mar-2002 “Circumstance, provenance and partial 
> knowledge - Limiting the scope of RDF assertions” 
> http://www.ninebynine.org/RDFNotes/UsingContextsWithRDF.html
> [2] John F. Sowa, “Knowledge Representation: Logical, Philosophical, 
> and Computational Foundations”, Brooks Cole Publishing Co., ISBN 
> 0-534-94965-7
> [3] Patrick Hayes “RDF Semantics” (W3C Working Draft 23 January 2003) 
> http://www.w3.org/TR/rdf-mt/
> [4] Graham Klyne, 18 October 2000 “Contexts for RDF Information 
> Modelling” http://public.research.mimesweeper.com/RDF/RDFContexts.html
> [5] Seth Russel, 7 August 2002 “Quads” 
> http://robustai.net/sailor/grammar/Quads.html
> [6] T. Berners-Lee, Dan Connoly “Notation 3” 
> http://www.w3.org/2000/10/swap/doc/Overview.html
> [7] Dave Beckett, “Contexts Thoughts" 
> http://www.redland.opensource.ac.uk/notes/contexts.html
> [8] http://demo.asemantics.com/biz/isc/
> [9] http://demo.asemantics.com/biz/lmn/
>
>
>
>
>>
>> I'd be interested in feedback here from Eric Miller and David Karger 
>> also?
>>
>> thanks
>>
>> Mark
>
>


-- 
========================================================
   Kevin Smathers                kevin.smathers@hp.com    
   Hewlett-Packard               kevin@ank.com            
   Palo Alto Research Lab                                 
   1501 Page Mill Rd.            650-857-4477 work        
   M/S 1135                      650-852-8186 fax         
   Palo Alto, CA 94304           510-247-1031 home        
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");
Received on Thursday, 26 June 2003 12:43:07 UTC