- From: Florian Kleedorfer <florian.kleedorfer@austria.fm>
- Date: Tue, 21 Nov 2017 16:48:32 +0100
- To: Martynas Jusevičius <martynas@atomgraph.com>
- Cc: Paul Tyson <phtyson@sbcglobal.net>, Semantic Web <semantic-web@w3.org>
Hi, Am 2017-11-21 16:02, schrieb Martynas Jusevičius: > Hi again, > > if you only need to "blend" 2 graphs at a time, with one typed > resource in each, I guess you can join on type, again even with > SPARQL. Can there be other commonality than type? What if you have > multiple resources with the same type? The individual graphs could be more complex, for example describing a goods transport including packaging, content, pickup time/location delivery time/location. The setting you are referring to is what we would call matching. We assume for the blending of goals graphs that matching has already occurred: the two agents have already identified each other and now want to describe their transaction. There aren't millions of graphs, there are only a handful (all the goals specified by both agents, in most cases probably one on each side). > > This seems like a very limited use case to me, which does not scale. I > was thinking more about a triplestore with thousands or millions of > graphs where any number of rides might match drivers to passengers. In > that case I don't think joining only on type can work. > > I'm also quite sure (optimized) SPARQL will be faster than any > rules/inference, and probably also custom algorithms, unless you use > some specialized storage beyond triples. I suspect the same, except that I am not (yet) convinced that I want to use SPARQL in the browser - which may be one platform we'd like to use. > > On Tue, Nov 21, 2017 at 3:48 PM, Florian Kleedorfer > <florian.kleedorfer@austria.fm> wrote: > >> Hi Martynas, >> >> I am not sure if I let the use case seem too narrow here. The taxi >> case is meant as an example, but the two graphs describing the >> agent's goals could contain any RDF. The commonality of all use >> cases is that a goal by itself is always incomplete and needs >> another agent's mutually compatible goal for being completed. >> >> Is my problem really that I have nothing to join on? In the example, >> there are the TBox resources that are shared (e.g., taxi:Ride) that >> could be used for joining. I think in the case of the example below, >> the CONSTRUCT approach is possible, as suggested and shown by Axel >> Polleres off-list a few days ago as a pattern-based solution to this >> problem. I would classify it as a knowledge-based 'blending' >> approach because the target structure as well as the structure of >> the two graphs we want to 'blend' need to be known beforehand, which >> allows for substantial optimizations. >> >> A downside the pattern-based approach, I suspect, is that it can >> only handle expected cases. Or do you see a way to generate such a >> SPARQL statement from the data ad-hoc? >> >> As a side-note, I am not so sure if I would refer to SPARQL as >> 'simple'. It's easy to write a simple SPARQL query, but it requires >> resources to parse and execute it, and it may be more efficient to >> reach the same result with a different algorithm, more tailored to >> the problem. >> >> Am 2017-11-21 14:34, schrieb Martynas Jusevičius: >> >>> If understand your use case right, your problem is that you have >>> nothing to join on, i.e. no common nodes, between the 2 rides. >>> >>> If there was something in common, I think it could be solved with >>> a >>> simple SPARQL CONSTRUCT. Now what that should be, depends on your >>> use >>> case. >>> >>> Maybe geohash of the location? >>> https://en.wikipedia.org/wiki/Geohash [1] >>> >>> In any case, I don't think it requires some kind of new graph >>> "blending" algorithm. >> >> I would be very relieved if I could take something off the shelf >> here. >> >> Best regards, >> Florian >> >> On Tue, Nov 21, 2017 at 12:40 PM, Florian Kleedorfer >> <florian.kleedorfer@austria.fm> wrote: >> >> Hi Paul, >> >> I received a similar hint yesterday off-list. I am really grateful >> for these suggestions, as I had not seen the possibility at all. It >> seems we should really look into it. >> >> You asked why I had not framed the problem more in terms of >> description logics. The main reason is: I had not even considered >> it, really. Here are some possible causes for that: >> >> One thing to say about description logics is that it is one more >> thing to learn, which makes it that much more unlikely a programmer >> will use description logics over something else that gets the job >> done. The graph blending approach is simpler in terms of logics but >> you have to learn SHACL before you can use it - not sure about the >> net trade-off value. >> >> Also, I am a little bit afraid of OWL for the appliation I have in >> mind, as it does not, in my (limited) experience, do well in the >> presence of messy data. It is easy to formulate inconsistent data, >> and to infer really unexpected things. The application - or rather, >> the protocol - is completely open (anyone can say anything.. >> including incomplete or inconsistent things), so I would prefer to >> remain on a less expressive level as much as possible, ideally doing >> no reasoning at all. >> >> I've seen SHACL work quite nicely in the browser, but I am not sure >> about reasoners for OWL or OWL-DL. Because of the reservations >> stated earlier, I must admit, I haven't checked. Doing the blending >> in the browser may not be a necessity, but I would not want to close >> that door just yet. >> >> Finally, I have to admit, for the DL based approach, I'm not really >> sure I would know even where to start. I'll be thankful for any help >> here and I'll be happy to cooperate if somone else wants to work on >> it! >> >> I am also thinking about the implications of the blending approach >> vs the DL based approach... The goal of blending was to connect two >> graphs that each describe one half of the solution, like puzzle >> pieces. The example I had in mind was: A says, "A is the passenger >> of the taxi ride." B says, "B is the driver of the taxi ride." Join >> the two structures by blending the "taxi ride" node and you get "A >> is the passenger of the taxi ride that B is the driver of". It seems >> to me that one can arrive at this result without special reasoning >> rules that were designed for exactly this use case. A simple rdfs >> vocabulary, some SHACL constraints, and the variable/constant >> definition would be enough. All of these could be different in every >> interaction, allowing for evolution of 'reasoning' as well as >> evolution of the use of vocabularies. The DL approach may be very >> powerful for premeditated scenarios, but I am not sure how well it >> would work if agents want to interact in ways that were not foreseen >> by the ontology design. >> >> The approach we have outlined so far is more of a linked data >> flavored approach than a semantic web one. For many simple cases, it >> may be sufficient and work just fine. It seems more robust to me >> (less prone to fail with messy data) than the DL based approach, and >> it might be a good thing to have as a baseline. However, I can >> imagine that both approaches will be supported at some point and >> they can live next to each other: some agents may prefer to express >> their goals in line with rdf blending, some with DL class >> intersections, some may use both. In all cases, the net result >> should be an RDF graph that the partners can agree to. >> >> Best >> Florian >> >> Am 2017-11-21 03:54, schrieb Paul Tyson: >> Hi Florian, very interesting problem. See my comments below. >> >> On Mon, 2017-11-20 at 19:09 +0100, Florian Kleedorfer wrote: >> (Note: I changed the subject of this thread following Pat Hayes' >> suggestion [1]) >> >> After some on-list and off-list discussions, the problem is becoming >> clearer. I'll try to state it again, along with some ideas for a >> solution. Any comments, criticism, and suggestions to collaborate >> are >> highly welcome ;-) >> >> # Blending: >> ## Definition >> Given two RDF graphs g1 and g2, blending refers to the pairwise >> renaming >> of resources (blank nodes or URI nodes) r1 in g1 and r2 in g2 such >> that >> one resource name (which may be r1, r2, or any other URI) is used >> instead of r1 and r2. The nodes r1 and r2 are said to be blended >> (ideas >> for a better term, anyone?). The resulting graphs are merged >> (avoiding >> accidental blank node identification, see RDF 1.1 Semantics[2]). The >> blending function b maps the input graphs to the set of all possible >> output graphs. >> >> I wonder why you do not frame the problem more in terms of >> description >> logics. "Blending" as you have described it seems to be a much >> looser >> (flexible?) way of finding the intersection of 2 classes. If the >> provider has defined the classes of goods/services they can provide, >> and >> the consumer defines the class(es) he wishes to buy, then before >> they >> can do business they must find an instance of good or service that >> falls >> in the intersection of those classes. >> >> Suppose the provider defines class P, the consumer class C. Then any >> transaction between them would be in the class defined by >> intersectionOf(P C). If this class is unsatisfiable (cannot have any >> members, for instance if P has :price = 20 and C has :price < 15), >> you >> want an algorithm to define and rank ways in which the provider's >> and >> consumer's class definitions could be modified to make the >> intersection >> satisfiable. These "negotiating rules" (or "distance measurements") >> are >> the interesting part of the problem. >> >> I do not know if there are implementable solutions to these problems >> in >> the description logics literature, but it might be worth a little >> searching before building the graph-manipulation approach you have >> outlined here. >> >> Regards, >> --Paul >> >> ## Discussion >> There are many ways to blend two graphs, and none of these ways are >> deemed correct or incorrect. >> We require this operation to combine two data structures held by >> mutually independent agents that have certain expectations about the >> blending solution. These expectations divide the set of possible >> outcomes at least in acceptable and unacceptable ones, possibly even >> in >> a preference ranking. I'll try to formulate that as well in the >> following. >> >> # Node blendability >> ## Rationale >> One way of constraining the result is to restrict which nodes can be >> blended. So far, we have identified three types: variable nodes, >> constant nodes, and unblendable nodes. Each node in an RDF graph has >> exactly one of the three types. A variable node may be blended and >> its >> name may be changed. A constant node may also be blended, but its >> name >> is not allowed to change. An unblendable node must not be blended. >> >> One may note that blending g1 and g2 is equivalent to merging them >> if no >> variable nodes are present in either graph. >> >> It is currently unclear how the constant/variable/unblendable >> property >> should be expressed. My intuition is that in most practical use >> cases, >> most nodes need to be unblendable (ontology/TBox resources, >> properties >> in general), therefore it would be more economical to assume nodes >> to be >> unblendable by default. Then, variable/constant nodes have to be >> annotated as such explicitly. >> Maybe it would be an option to make definitions by URI prefix, so we >> can >> define a complete vocabulary as unblendable with one statement. >> (opinions on that?) >> >> ## Definition >> Given two graphs g1 and g2, a blendability-aware blending function >> is a >> blending function for g1 and g2 that only replaces resources that >> have >> not been declared as constant with respect to blending. The latter >> is >> done by specifying a triple "<graph> bl:containsConstant <resource> >> ." >> or "<graph> bl:containsVariable <resource> ." in an additional graph >> passed to the blendability-aware blending function. Such a graph is >> called a 'blendability graph'. >> >> # Constraints on the blending solution: >> ## Rationale >> As explained earlier, there may be multiple parties that have >> expectations for the blending solution. So far, we have talked about >> the >> originators of the data. There may also be one or more intended >> consumers of the data. Nothing prevents the expectations from being >> mutually incompatible (meaning that there exists no graph that >> satisfies >> them all) or that they are even unsatisfiable by themselves. It >> would of >> course be beneficial if such cases could be detected automaticall, >> but >> for now, let's just leave it at noting the possibility. Now, if the >> information the agents provided, when considered in total, has >> spurious >> or conflicting information, or if information is missing (taking >> their >> respective expectations as a definition of what is acceptable), the >> blending solution cannot meet the expectations. Nevertheless, it has >> to >> be computed, in order to be able to take the next step and "repair" >> (better word, please) one or both graphs and try again. >> >> It would be nice to use constraints for the following goals during >> the >> calculation of the blending solution: >> 1. pruning wholly unacceptable solutions >> 2. ranking unacceptable solutions by badness >> 3. ranking acceptable-looking, but incomplete solutions ("so far so >> good") >> >> Constraints should allow us to identify an acceptable result, or to >> select the best result of a number of unacceptable ones, and report >> them >> to users so that they can repair them. >> >> However, if the expectations define an acceptable result, anything >> else, >> however close, is unacceptable, and in the absence of acceptable >> results >> we are left with unacceptable ones, some of which may be completely >> bogus while others are almost acceptable. So, the three categories >> above >> cannot be separated, and the best we can hope for is reasonable >> ranking >> of unacceptable solutions. >> >> For the conceptualization, we go with a simple SHACL-based approach. >> Other systems (SPARQL, ShEx) are definitely possible as well. >> >> ## Definition >> The constraints-aware blending function is a blending function for >> two >> RDF graphs g1 and g2 and takes optional graph names s1,...,sN that >> denote graphs which contain SHACL shapes constraining the result. >> Each >> blending solution is validated using s1,...,sN; the result of the >> function is a mapping of the blending solutions to their respective >> SHACL validation results, indexed by shape graph name (s1,...,sN). A >> blending solution is called accepted with respect to a constraints >> graph >> s if it causes no SHACL validation results when validated against s. >> A >> blending solution is called accepted if it is accepted with respect >> to >> all specified shapes graphs. >> >> ## Discussion >> The main issue in this step seems to be the pruning and ordering of >> unacceptable results. Blending functions may choose to drop blending >> solutions if they are deemed too bad - but where should they draw >> the >> line and still report an unacceptable blending solution while >> dropping >> another? >> >> With respect to one shapes graph, we could define that validation >> result >> graphs are ranked by maximum severity (1 sh:Violation is worse than >> 1000 >> sh:Warnings), and within those classes by number of validation >> results >> (2 Violations and 1 Warnings is worse than 1 Violation and 1000 >> Warnings). >> >> I'm suspecting that the validation results should allow us to prune >> at >> least some bad blending solutions if their validation results are >> supersets of other solutions' validation results: if solution A has >> validation results (R1) and solution B has validation result (R1, >> R2) we >> can say that solution B is a deterioration of solution A and should >> not >> be reported. >> >> In addition to constraint-based pruning and ranking, it is of course >> possible to perform reasoning-based pruning and validate the >> blending >> solution according to appropriate reasoning logics (RDFS, OWL, ...). >> However, I am not sure as to the usefulness of this approach if no >> solution is found that is consistent according to the logic. >> >> # Graph blending algorithms >> Let's assume for this section that we are blending two graphs g1 and >> g2, >> and we know we have variable nodes var(g1) and constant nodes >> const(g1) >> in g1, and var(g2) and const(g2) in g2. >> >> ## Complete >> The complete algorithm consists of enumerating all possible >> solutions >> For each n in var(g1), there are the following solutions: >> * 1 solution in which n is not blended at all >> * 1 solution for each m in var(g2), in which n is blended with m >> * 1 solution for each m in const(g2), in which n is blended with m >> symmetrically the same is true for each n in var(g2). >> Removal of duplicate solutions yields the set of solutions. >> >> ## Heuristic >> We may interpret blending as a heuristic search and apply an A*-like >> algorithm to finding good results, assessing the distance to the >> goal by >> evaluating the constraints, and the distance from the start by the >> number of node blendings. >> >> ## Knowledge-based >> Exploiting knowledge about the actual structures in g1 and g2, the >> heuristic approach could be enhanced to apply more appropriate >> distance >> estimates and select more promising paths first. >> >> Links: >> 1. >> https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html >> [2] >> [1] >> 2. https://www.w3.org/TR/rdf11-mt/ [3] [2] >> >> Am 2017-11-16 19:37, schrieb Florian Kleedorfer: >> Hi, >> >> As you may recall, I've asked this list about >> messaging/negotiating >> using linked data[1], and it led me and my colleagues to a concept >> of >> RDF-based agreements[2]. (Thanks again for your input!) >> >> Now it's time for us to define how the negotiating parties come up >> with the data (set of rdf triples) that they agree on. >> >> One Idea we have is that both particpants (P1,P2) specify 'goals', >> each consisting of a SHACL shapes graph and a data graph. Two >> goals >> can be tested for compatibility by merging the two data graphs and >> then testing if the resulting graph is valid given the shapes >> defined >> by both goals. The problem we face with this approach is merging >> the >> two data graphs. Consider the example in this gist: >> >> https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332 >> [4] >> [3] >> >> In that example, we would want to merge these two graphs: >> >> ex1:p1g-data { >> ex1:ride1 a taxi:Ride . >> ex1:ride1 taxi:hasDriver ex1:p1 . >> ex1:ride1 taxi:hasPickupLocation ex1:pickup1; >> } >> >> and >> >> ex2:p2g-data { >> ex2:myRide a taxi:Ride. >> ex2:myRide taxi:hasClient ex2:p2 . >> ex2:myRide taxi:hasPickupLocation ex2:myPickupLocation . >> ex2:myPickupLocation a s:GeoCoordinates ; >> s:latitude "48.213814" ; >> s:longitude "16.340870" . >> } >> >> We want that merge to happen in such a way that the shapes in >> ex1:p1g-shapes and ex2:p2g-shapes can be evaluated, which will >> require >> to merge some of the nodes. (In the example, we'll find out that a >> taxi:hasPickupTime property is missing. One of the participants is >> then to add that property, and the whole structure becomes valid >> according to both goals' shapes.) >> >> The question is how exactly to achieve that merge. Intuitively, >> the >> result in this case should be >> >> :mergedNode1 a taxi:Ride . >> :mergedNode1 taxi:hasDriver ex1:p1 . # note: p1 links her own >> identifier to the structure >> :mergedNode1 taxi:hasPickupLocation :mergedNode2; >> :mergedNode1 a taxi:Ride. >> :mergedNode1 taxi:hasClient ex2:p2 . >> :mergedNode1 taxi:hasPickupLocation :mergedNode2 . >> :mergedNode2 a s:GeoCoordinates ; >> s:latitude "48.213814" ; >> s:longitude "16.340870" . >> >> and after removal of duplicate triples >> >> :mergedNode1 a taxi:Ride . >> :mergedNode1 taxi:hasDriver ex1:p1 . >> :mergedNode1 taxi:hasClient ex2:p2 . >> :mergedNode1 taxi:hasPickupLocation :mergedNode2 . >> :mergedNode2 a s:GeoCoordinates ; >> s:latitude "48.213814" ; >> s:longitude "16.340870" . >> >> (I currently don't care about the URIs of the mergedNodes - just >> something one can mint when doing the merge) >> >> Do you have any hints on how to perform such a merge reliably, >> also >> with more complex structures? >> >> Thank you very much in advance! >> >> Best regards, >> Florian >> >> Links: >> 1. Negotiations discussion thread: >> >> https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html >> [5] >> [4] >> 2. Agreements paper: >> http://ceur-ws.org/Vol-1934/contribution-07.pdf [6] [5] > > Links: > ------ > [1] > https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html > [2] > [2] https://www.w3.org/TR/rdf11-mt/ [3] > [3] > https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332 > [4] > [4] > https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html > [5] > [5] http://ceur-ws.org/Vol-1934/contribution-07.pdf [6] > > > > Links: > ------ > [1] https://en.wikipedia.org/wiki/Geohash > [2] https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html > [3] https://www.w3.org/TR/rdf11-mt/ > [4] > https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332 > [5] https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html > [6] http://ceur-ws.org/Vol-1934/contribution-07.pdf
Received on Tuesday, 21 November 2017 15:49:13 UTC