- From: Graham Klyne <GK@dial.pipex.com>
- Date: Wed, 24 May 2000 15:30:23 +0100
- To: RDF interest group <www-rdf-interest@w3.org>, CC/PP WG list <w3c-ccpp-wg@w3.org>
A STRAWMAN PROPOSAL 1. The problem -------------- It is easy to attach trust and other context to a document containing serialized RDF: simply sign the document, or attach context information to the file or resource. But when the RDF model data is abstracted from its serialization, all we have is a set of statements (and sets of the component parts of statements) [RDFM&S, section 5]. Any information about containment of the serialized form is lost, and there is no defined means to identify any given statement (other than by its entire value). Manipulation and analysis of RDF model data are most usefully performed on the abstract graph (I assert). But to include trust and other contextual information in any such manipulations, they too must be represented in the graph. The currently defined mechanism to do this is reification -- representation of each statement (property arc) in a graph by a new resource itself having four new property arcs -- incurring a 500% overhead in a simple-minded implementation. A second problem with using reification is that it is not an obvious technique for implementers: this in turn is likely to lead to incorrect implementations, or even non-adoption of the framework. For many applications, including CC/PP, capturing contextual information does seem to be very important. One way to address this problem might be to extend the RDF model; e.g. to define a new subset of resources called "Contexts", and to define each RDF statement to be a quadruple consisting of: {Predicate, Subject, Object, Context} I understand that some implementations of RDF do this. But, for the time being, the official RDF model consists of triples. 2. An approach to representing context in the RDF model ------------------------------------------------------- Sergey Melnik has suggested [http://lists.w3.org/Archives/Public/www-rdf-interest/2000Apr/0113.html] an approach of constructing a digest of each statement or subgraph, and having a context resource refer to these digest values in lieu of the reified statement(s): >The recipient gets: > >T --rdf:type--> SignedStatements >T --principal--> Alice >T --algorithm--> RSA >T --statement--> <hash1> >... >T --statement--> <hashN> ><statement 1> >... ><statement N> > >Given that there exist an algorithm for computing hashes of statements, >the recipient can iterate through the statements in the message and >match their hashes against those explicitly listed. In the same loop, >the hash of the model is computed. One problem with this is that neither finding the statement(s) corresponding to a context nor finding the context associated with a statement is obviously easy (each involving a search and value computation over the statement database), and scaling may prove problematic. A related approach is one in which the digest values of individual statements are used to construct resource identifiers that can stand for the reified statement resources, and which have property arcs indicating the applicable context(s) for the corresponding statement. The above example then becomes: >T --rdf:type--> SignedDocument >T --principal--> Alice >T --algorithm--> RSA >digest:<hash1> --context--> T >... >digest:<hashN> --context--> T ><statement 1> >... ><statement N> This approach seems to have the following advantages over the previous suggestion: - given a statement, it is easy to find any applicable contexts (assuming access by RDF resource ID is a primitive function of the RDF handling engine used), - this model very naturally extends to full RDF reification -- simply add the appropriate rdf:predicate, rdf:subject, rdf:object and rdf:type properties to the digest:<hashX> resource. - this approach more closely matches the context extension to the RDF model suggested above There is still the problem of finding all statements to which a given context applies, but, if necessary, this might be overcome by having back-links from the context: >T --rdf:type--> SignedDocument >T --principal--> Alice >T --algorithm--> RSA >T --statement--> digest:<hash1> >... >T --statement--> digest:<hashN> >digest:<hash1> --context--> T >... >digest:<hashN> --context--> T ><statement 1> >... ><statement N> which starts to look a bit like Sergey's scheme again. 3. Putting into practice ------------------------ If this is deemed to be a reasonable way of handling contextual information in RDF, then for interoperability between applications processing this information (and possibly exporting/importing it in an RDF graph serialization) I think it would be needed to specify: (a) a digest algorithm for digesting RDF statements (b) a URI scheme for embedding digest values in a resource ID (c) an RDF class to be a subclass of any context class (d) an RDF property corresponding to --context--> (domain: rdf:Statement, range: xxx:Context) (e) an RDF property corresponding to --statement--> (domain: xxx:Context, range: rdf:Statement) Applications would choose, according to their needs, whether or not to include additional information ("partial reification"?) into the graph that they construct, so simply defining a standard form for these mechanisms would not impose any overhead for applications that do not need to process contextual information. -- So, the question: Is this worthy of pursuing? #g ------------ Graham Klyne (GK@ACM.ORG)
Received on Wednesday, 24 May 2000 11:41:31 UTC