- From: Reto Bachmann-Gmür <reto@gmuer.ch>
- Date: Mon, 21 Nov 2005 16:31:04 +0100
- To: Joshua Tauberer <tauberer@for.net>
- CC: Andreas Andreakis <andreas.andreakis@gmx.de>, semantic-web at W3C <semantic-web@w3c.org>
Joshua Tauberer schrieb: > It might be useful at this point to formalize what the problem is. > Reto said earlier a very general version of the problem: find for a > graph g "the smallest graph g1, so that g entails g1 and g1 entails g." > > I think this might be too general unless we add that g1 is a subset of > g. (Otherwise, consider that I can write a single statement and then > define the semantics of that statement outside of RDF to entail any > graph.) sound reasonable, still quite insecure about terminology, would this be the same to say "... g rdf-entails g1 and g1 rdf-entails g"? > So the general problem then is for graph g, find non-empty graphs a > and b such that a union b = g and a entails b. Then, b can be > removed. But splitting a graph into two parts is tricky because of > blank nodes -- each blank node can appear in either a or b, but not > both. For every b, for every blank node in b, every statement in g > mentioning that node must be in b. (Someone named something like this > a 'minimal self-contained graph', MSG.) I have been looking at the concept of "rdf molecules" [1] which seems to be very close to MSG. It distinguishs between "terminal molecules", which are graphs in which every node is grounded (i.e. has an unambiguous cross-model identity either because it's a literal value or a named resource or because it is "functionally grounded" by an (i)fp) and "contextual molecules" which contain anonymous nodes with an identity arising from the context of the node, in a lossless decomposition of a graph the contextual molecules contain all triples containing their ungrounded nodes. In fact I was working on a jena-implementation of this "rdf molecules" concept when I come across the problem of keeping the contextual nodes lean, a bigger graph could then be made lean by - splitting it into molecules - replacing contextual molecules by equivalent lean graphs - removing duplicate molecules - merge the molecules Of course, this doesn't solve the problem, but its solution would have to be applied to smaller graphs. > Ignoring any type of inferencing and since a and b don't have any > statements in common, a can only entail b if b is a subgraph of a but > with one or more nodes in each statement replaced by other blank nodes > from g -- b is just a weaker version of some part of a. And, b must > contain blank nodes. (Okay, so you knew this already.) > > Finally, a crude algorithm would be to find all subgraphs b of g such > that 1) every statement in b has a blank node, 2) b contains all > statements in g mentioning blank nodes in b (so there is at most one b > for every blank node), 3) (g-b) entails b, which is to say that there > is a mapping M from blank nodes in b to nodes not in b such that (g-b) > contains M(b). The idea sounds good, it seems to reduce the problem to subgraph-isomorphism (check if b is a subgraph of (g-b)) for which I guess an algorithm similar to the one described by Jeremy Carroll for graph-isomorphism [2] could be used. However I don't see how your algorithm would successfully be applied to g: [ foaf:knows [ foaf:name "Jo]; foaf:knows [ foaf:name "Jo] ]. To comply with condition 2 b would have to contain all triples of g, or am I too tired? > That's all off the top of my head... Could be wrong. And as for an > efficient algorithm, I have no idea... > Thanks for sharing - currently I have no algorithm at all not even an inefficient one :-(. The terribly inefficient algorithm I was playing with had the bug that it wrongly reduced the graph _:a foaf:knows _:b _:b foaf:knows _:a to _:a foaf:knows _:a reto [1] http://www.ksl.stanford.edu/people/pp/papers/Ding_ISWC_2005.pdf [2] /www.hpl.hp.com/techreports/2001/HPL-2001-293.html/
Received on Monday, 21 November 2005 15:32:04 UTC