- From: John S. Erickson <john.erickson@hp.com>
- Date: Fri, 27 Jun 2003 11:56:05 -0400
- To: <www-rdf-dspace@w3.org>
Alberto made a huge point: > in our interpretation of provenance/contexts in RDFStore we assumed > that a statement represents a fact that is asserted as true in a > certain context. This circumstance (e.g. space/temporal, situation or > scope) where the statement has been stated represents “contextual” > information about the statement [1][2]. For example, when triples are > being added to a graph it is often useful to be able to track back > where they came from (e.g. Internet source Web site or domain), how > they were added, by whom, why, when (e.g. date), when they will expire > (e.g. Time-To-Live) and so on. Such context (or provenance information) > can be thought of as an additional and orthogonal dimension to the > other 3 components. This concept is not part of the current RDF data > model [3] and referred to as “statement reification". From the > application developer point of view there is a clear need for such > primitive constructs to layer different levels of semantics on top of > RDF which can not be represented in the RDF triples space.... JSE: The notion of preserving the context of a statement WITHOUT TRANSFORMING THAT STATEMENT is critical for RDF application developers and I believe is being overlooked. I believe RDF's current approach, which reifies the statement, is artifically invasive and complex. In a real world, statements will be conceptually contained, aggregated and nested; it seems crazy that in order to deal with them in such a way, we must artificially blow them apart. For another argument about the need to easily nest (and how reification and RDF/XML introduces unnecessary complexity) see: http://purl.oclc.org/NET/RDF_M_S_Revisited (PDF) Given a statement like: [s,p,o] ...provenance simply means we want the ability to make a statement ABOUT that statement without changing that statement, as in: [s1,p1,[s,p,o]] ...while preserving the intention of this second statement, which is for the first triple to be the object of a second triple. If we have a quad store, this looks like: [i1,s,p,o] [i2,s1,p1,i1] ...in which we are using the "4th element" as a statement identify. We have the ability to *explicitly* define context membership in the following way: [i1,s1,p1,o1] [i2,s2,p2,o2] [i3,c1,p3,i1] [i4,c1,p3,i2] ...in which subject c1 is the context identifier and p3 is a "contains" predicate (jse:contains]. We can also define context membership *implicitly* as follows: [c1,s1,p1,o1] [c1,s2,p2,o2] [c2,s3,p3,o3] [c2,s4,p4,c1] There are two contexts shown, each containing two arbitrary statements. The first context c1 contains triples [s1,p1,o1] and [s2,p2,o2]. The context c2 contains [s3,p3,o3] and [s4,p4,c1]; this second statement happens to have as its object c1, thus illustrating nesting. This example shows two different ways of constructing application-level abstractions for containment, one explicit and one implicit, both leveraging quads and neither one artificially trashing the contained statements...John > ...Applications > normally need to build meta-levels of abstraction over triples to > reduce complexity and provide an incremental and scaleable access to > information. For example, if a Web robot is processing and syndicating > news coming from various on-line newspapers, there will be overlap. An > application may decide to filter the news based not only on a timeline > or some other property, but perhaps select sources providing only > certain information with unique characteristics. This requires the > flagging of triples as belonging to different contexts and then > describing in the RDF itself the relationships between the contexts. At > query time such information can then be used by the application to > define a search scope to filter the results. Another common example of > the usage of provenance and contextual information is about digital > signing RDF triples to provide a basic level of trust over the > Semantic. In that case triples could be flagged for example with a PGP > key to uniquely identify the source and its properties. There have been > several attempts [4][5][6][7] trying to formalize and use contexts and > provenance information in RDF but there is not yet a common agreement > how to do it. It is also not completely clear how an application would > benefit from this information. Jena2 seems is also trying some steps in > that direction too. > Our approach to model contexts and provenance has been simpler and > motivated by real-world RDF applications we have developed [8][9]. We > found that an additional dimension to the RDF triple can be useful or > even essential. Given that the usage of full-blown RDF reification can > be cumbersome due to its verbosity and inefficiency, we developed a > different modeling technique that flags or mark a given statement as > belonging to one or more specific contexts. > > On the practical side, our Perl/C API allows to add/remove and search > triples into specific "spaces" or contexts and serialize them back as > Quads (simple extension to N-Triples syntax) - at the moment we are > about to implement a serialization of context back to RDF/XML (also as > Jan suggested) via the rdf:ID reification stuff and at parse time will > just flag those triples (predicates) as "special" or asserted in a > different context - in the past we used rdf:bagID for to hack this > functionality but it has been recently dropped from the specs as you > probably noticed. At the RDQL query level we allow a 4-th component as > URI (resource) on triple-patterns to specify/select the context - the > nice part of it is that sub-sequent triple-patterns can refine and > select the vars from that 4-th component to "unify" descriptions of > different levels. > > As an example, as presented at the WWW2003 devday, we have some demo > queries using contexts available > > http://demo.asemantics.com/rdfstore/www2003/ > > The example database contains scraped news from most italian > newspapers, where each channel and news item is put into a specific > source context - this allows us to filter results by date, by source > avoiding overlaps and clashing of URLs (eg. some newspapers recycling > the same URL every day but with different HTML content). In particular > look at the last two queries (number 9 and 10) using contextual > information at the RDQL level - the very last one is pretty cool to me, > which allows to describe the 4-th context component with a dc:date and > then join it into the other triple space. > > BTW: while at www2003 I had a chat with Matt Biddulph about his RSS > codepiction code/demo and he seems to have similar problems and > solutions using Jena with reification to mimic contextual information - > that means that this aspect is going to fundamental for the success of > the whole Semantic Web and RDF systems to me > > but yes, all this is not "standard" :-) > > hope this helps > > all the best > > Alberto > > [1] Graham Klyne, 13-Mar-2002 “Circumstance, provenance and partial > knowledge - Limiting the scope of RDF assertions” > http://www.ninebynine.org/RDFNotes/UsingContextsWithRDF.html > [2] John F. Sowa, “Knowledge Representation: Logical, Philosophical, > and Computational Foundations”, Brooks Cole Publishing Co., ISBN > 0-534-94965-7 > [3] Patrick Hayes “RDF Semantics” (W3C Working Draft 23 January 2003) > http://www.w3.org/TR/rdf-mt/ > [4] Graham Klyne, 18 October 2000 “Contexts for RDF Information > Modelling” http://public.research.mimesweeper.com/RDF/RDFContexts.html > [5] Seth Russel, 7 August 2002 “Quads” > http://robustai.net/sailor/grammar/Quads.html > [6] T. Berners-Lee, Dan Connoly “Notation 3” > http://www.w3.org/2000/10/swap/doc/Overview.html > [7] Dave Beckett, “Contexts Thoughts" > http://www.redland.opensource.ac.uk/notes/contexts.html > [8] http://demo.asemantics.com/biz/isc/ > [9] http://demo.asemantics.com/biz/lmn/ > > > > > > > > I'd be interested in feedback here from Eric Miller and David Karger > > also? > > > > thanks > > > > Mark > > >
Received on Friday, 27 June 2003 12:04:31 UTC