- From: Reto Bachmann-Gmür <reto@gmuer.ch>
- Date: Mon, 21 Nov 2005 19:56:30 +0100
- To: Joshua Tauberer <tauberer@for.net>
- CC: Andreas Andreakis <andreas.andreakis@gmx.de>, semantic-web at W3C <semantic-web@w3c.org>
Joshua Tauberer schrieb: >> The idea sounds good, it seems to reduce the problem to >> subgraph-isomorphism (check if b is a subgraph of (g-b)) for which I >> guess an algorithm similar to the one described by Jeremy Carroll for >> graph-isomorphism [2] could be used. > > > From a quick skim, it looks that that's only applicable for > (graph-graph) isomorphism rather than entailment (in this case whether > there is a subgraph isomorphic to another graph). I was thinking at the possibility to assign up to four hashes per triple in the source graph (one based on both named/literal subject and object, two making on the two anonymous and another one with both anonymized), reduce the source graph to statements available in the target graph, and then refine hashes till the size of the source graph is near the size of the target graph. > You could use any SPARQL implementation, really. The blank nodes are > the variables. really? I though real-world sparql implementations only support tree-style queries, if not there's indeed existing and presumably optimized code for subgraph isomorphism. >> However I don't see how your algorithm would successfully be applied to >> g: >> [ >> foaf:knows [ foaf:name "Jo]; >> foaf:knows [ foaf:name "Jo] >> ]. > > > Right, I was wrong to only allow b to be a MSG. So it's more > complicated. Here's a second try: Find all subgraphs b of g such that > (g-b) entails b, which is to say that there is a mapping M from blank > nodes *only* in b to nodes in (g-b) such that (g-b) contains M(b). > The difference is that if b contains a blank node that is also in > (g-b), then it's effectively a constant when comparing b to (g-b). > > Hopefully that's right. b doesn't have to be a MSG, but some other > constraints fall out of that. b still has to have all of the > statements about some blank node so that M does some mapping. And > every statement in b must contain a blank node only in b. (A > statement can't be in both b and (g-b), so M has to do some mapping > for each statement so that M(b) might possibly be a subset of (g-b).) > > Before there were at most N b's to look at (at most one MSG per blank > node, some sharing MSGs), and now there are possibly 2^N (each b > contains all of the statements, or not all, of each blank node). > Also the only b's worth considering are connected graphs (every node > is reachable from every other node), since otherwise b is just the > union of two smaller graphs which could be considered on their own. I > think that this means that all of the b's worth considering are > subgraphs of MSGs, and further only need to be entailed by the MSG > containing b (and not all of (g-b)), which narrows down things a lot. Realize, that the approach I presented to use molecules to reduce the size of the graphs that have to be made lean doesn't work, as it doesn't take into account anonymous resources which are "duplicate" of named or literal resources. I guess the same problem applies for MSG, as they only include statements wit blank nodes, so they are the same as "maximum contextual molecules", except MSGs don't treat functionally grounded nodes particularly. > So a new algorithm would be to find all MSGs b, make b lean by > checking all 2^N relevant subgraphs (except the N here is just the > blank nodes in b; or something more efficient), and then checking if > this new b is entailed by (g-b). If this is right. OK, so you're checking the lean b against the whole (g-b), we could check again the (connected subgraph of g containing b) - g, but clearly the connections may not be limited to those through anonymous nodes - this may not be a big advantage, as probably most graphs are connected (e.g. by the rdf:type properties to a narrow set of classes). How do I check the 2^N subgraphs when making b lean? reto reto
Received on Monday, 21 November 2005 18:56:51 UTC