- From: Gabe Beged-Dov <begeddov@jfinity.com>
- Date: Mon, 11 Jun 2001 21:56:42 -0700
- To: "www-rdf-interest@w3.org" <www-rdf-interest@w3.org>
Many moons ago, there was a discussion [1] on a topic that is dear to my heart. The closest I can come to an issue on this topic is [2]. Meanwhile, I've been rethinking how tracing of statement origin can be handled. My thoughts aren't that far along but I thought its better to share them than to be overtaken by events. A primary use-case from my perspective is the often described ability of RDF to allow aggregation of distinct RDF sources. Two different examples of this are Mozilla which keeps the various sources that make up the aggregate store separate and most of the rest of the RDF implementations which either keep the individual "models" totally separate or slurp all of the sources into a single model. For example, I would like to be able to slurp up a large set of rss1 channels into a single RDF db and then be able to interact with it as both a single dataset or be able to distinguish the original document context from which the various statements originated. The tack I was pursuing last year was based on the premise that the M&S should be taken literally as to reification everywhere. If you accepted that premise (which is under discussion in rdfcore [3]) then you would generate alot more triples (although with many possible optimizations) which would potentially allow you to trace statements back to both the document and the rdf:Description element that they occurred in. Lately, I've been thinking of a different approach. As before, it focuses on the document centric aspects of RDF rather than the model centric ones. In the same way that you can distinguish between the docu-heads and the data-heads in the XML world, I think you can distinguish between the docu/data-heads and the model/logic-heads in the RDF world. In both cases, this is a gross generalization. Having made this generalization, I see myself falling into the former camp. On to the approach. The desired result of this or any similar exercise in my mind is to be able to "join" triples generated from a set of documents and still be able to distinguish which document each triple came from. This especially includes triples that have the same [s,p,o]. Lets say I have two RSS channels that I aggregate. Both have an item about today's article about the McVeigh execution at nytimes.com (http://www.nytimes.com/2001/06/12/national/12MCVE.html). The two channels have completely different viewpoints on the execution. Let's call them channelA and channelB. Here are examples of their rss1 document fragments: =================== http://controversy.com/channelA/2001/06/12.rdf <rdf:RDF ns-decls xmlns:contro="http://www.controversy.com/controVocab/"> ... <rss1:item rdf:about="http://www.nytimes.com/2001/06/12/national/12MCVE.html"> <rss1:description>It was a sad day...</rss1:description> <contro:theRefs> <rdf:Bag> <rdf:li rdf:resource="really sad url" /> </rdf:Bag> </contro:theRefs> </rss1:item> =================== http://controversy.com/channelB/2001/06/12.rdf <rdf:RDF ns-decls xmlns:contro="http://www.controversy.com/controVocab/"> ... <rss1:item rdf:about="http://www.nytimes.com/2001/06/12/national/12MCVE.html"> <rss1:description>It was a happy day...</rss1:description> <contro:theRefs> <rdf:Bag> <rdf:li rdf:resource="really happy url" /> </rdf:Bag> </contro:theRefs> </rss1:item> =================== If these are directly slurped into a single rdf store we would get something like the following when we serialize back to RDF/XML =================== <rdf:RDF ns-decls xmlns:contro="http://www.controversy.com/controVocab/"> ... <rss1:item rdf:about="http://www.nytimes.com/2001/06/12/national/12MCVE.html"> <rss1:description>It was a sad day...</rss1:description> <contro:theRefs> <rdf:Bag rdf:about="http://controversy.com/channelA/2001/06/12.rdf#gen10"> <rdf:li rdf:resource="really sad url" /> </rdf:Bag> </contro:theRefs> <rss1:description>It was a happy day...</rss1:description> <contro:theRefs> <rdf:Bag rdf:about="http://controversy.com/channelB/2001/06/12.rdf#gen11"> <rdf:li rdf:resource="really happy url" /> </rdf:Bag> </contro:theRefs> </rss1:item> =================== Unfortunately, we can't tell one rss1:description from the other in the joined result. Still, notice that the rdf:Bag that were unlabeled in the source documents have been labelled by the processor. AFAIK, this is universal behavior on the part of RDF processors when they encounter RDF/XML resources that are either labeled with an ID or are unlabeled. I.e., the are labeled with a URIref whose URI is that of the source RDF/XML document and whose fragment identifier is implementation dependant. The Approach ============ If you assume that this behavior is correct then you are already a major part of the way to having your triples traceable to the source document without generating any more triples (note that Sirpac doesn't seem to label anonymous nodes with the URI of the document. Instead, it uses "_".). What's left are the classic RDF/XML resources that are labeled with an rdf:about. What I am proposing is that these be considered a shorthand for an anonymous resource that has a new property that has the rdf:about as its value. In some ways, this is similar to the approach that was taken by Henrik Frystyk Nielsen at WWW9 [4]. Rather than rdf:about being treated as a magical attribute, it becomes a property of an RDF/XML resource. This potentially allows a distinction between RDF/XML resources and Web resources although I'll leave that as a separate exercise. While we're hacking at the syntax, It would be really interesting to unify several other aspects of RDF/XML including the distinction between string-valued properties and XML-valued properties and between resources and literals. I hint at that with the naming of the new property that holds the value formerly known as the rdf:about attribute. The property would be named rdf:aboutURI. The hint is that you could also have an rdf:aboutLiteral. So here's the example above recast using rdf:aboutURI =================== http://controversy.com/channelA/2001/06/12.rdf <rdf:RDF ns-decls xmlns:contro="http://www.controversy.com/controVocab/"> ... <rss1:item> <rdf:aboutURI rdf:resource="http:.../12MCVE.html"/> <rss1:description>It was a sad day...</rss1:description> <contro:theRefs> <rdf:Bag> <rdf:li rdf:resource="really sad url" /> </rdf:Bag> </contro:theRefs> </rss1:item> =================== http://controversy.com/channelB/2001/06/12.rdf <rdf:RDF ns-decls xmlns:contro="http://www.controversy.com/controVocab/"> ... <rss1:item> <rdf:aboutURI rdf:resource="http:.../12MCVE.html"> <rss1:description>It was a happy day...</rss1:description> <contro:theRefs> <rdf:Bag> <rdf:li rdf:resource="really happy url" /> </rdf:Bag> </contro:theRefs> </rss1:item> =================== In the serialization below, the processor has generated URIref using the baseURI of the source documents for all the anonymous nodes (which now include the formerly explicit rdf:about labeled nodes). =================== <rdf:RDF ns-decls xmlns:contro="http://www.controversy.com/controVocab/"> ... <rss1:item rdf:about="http:.../channelA/2001/06/12.rdf#gen01"> <rdf:aboutURI rdf:resource="http:.../12MCVE.html"/> <rss1:description>It was a sad day...</rss1:description> <contro:theRefs> <rdf:Bag rdf:about="http:.../channelA/2001/06/12.rdf#gen02"> <rdf:li rdf:resource="really sad url" /> </rdf:Bag> </contro:theRefs> </rss1:item> <rss1:item rdf:about="http:...channelB/2001/06/12.rdf#gen01"> <rdf:aboutURI rdf:resource="http:.../12MCVE.html"> <rss1:description>It was a happy day...</rss1:description> <contro:theRefs> <rdf:Bag rdf:about="http:.../channelB/2001/06/12.rdf#gen02"> <rdf:li rdf:resource="really happy url" /> </rdf:Bag> </contro:theRefs> </rss1:item> =================== There are some issues with this approach. One is that you are generating another triple for each resource labeled using an rdf:about in the source document. Another is that systems that directly join on the subject of triples wont work. OTOH, they have a very straightforward work-around of doing the join on a property rather than on the subject. In XSLT terms, it would be something like: <xsl:variable name="joinlist" select="*/rdf:aboutURI[@rdf:resource=$joinURI]/.." /> rather than: <xsl:variable name="joinlist" select="*[@rdf:about=$joinURI]" /> [1] http://lists.w3.org/Archives/Public/www-rdf-logic/2000Nov/0112.html [2] http://www.w3.org/2000/03/rdf-tracking/#rdfms-contexts [3] http://www.w3.org/2000/03/rdf-tracking/#rdfms-reification-required [4] http://www.ilrt.bris.ac.uk/discovery/2000/08/www9-slides/henrik/
Received on Tuesday, 12 June 2001 01:04:28 UTC