- From: Martynas Jusevičius <martynas@atomgraph.com>
- Date: Tue, 21 Nov 2017 14:34:30 +0100
- To: Florian Kleedorfer <florian.kleedorfer@austria.fm>
- Cc: Paul Tyson <phtyson@sbcglobal.net>, Semantic Web <semantic-web@w3.org>
- Message-ID: <CAE35VmzfWbaubnhQScGV50gnYsAVv=XJByMzjBzrX8ZwvSMVCg@mail.gmail.com>
If understand your use case right, your problem is that you have nothing to join on, i.e. no common nodes, between the 2 rides. If there was something in common, I think it could be solved with a simple SPARQL CONSTRUCT. Now what that should be, depends on your use case. Maybe geohash of the location? https://en.wikipedia.org/wiki/Geohash In any case, I don't think it requires some kind of new graph "blending" algorithm. On Tue, Nov 21, 2017 at 12:40 PM, Florian Kleedorfer < florian.kleedorfer@austria.fm> wrote: > Hi Paul, > > I received a similar hint yesterday off-list. I am really grateful for > these suggestions, as I had not seen the possibility at all. It seems we > should really look into it. > > You asked why I had not framed the problem more in terms of description > logics. The main reason is: I had not even considered it, really. Here are > some possible causes for that: > > One thing to say about description logics is that it is one more thing to > learn, which makes it that much more unlikely a programmer will use > description logics over something else that gets the job done. The graph > blending approach is simpler in terms of logics but you have to learn SHACL > before you can use it - not sure about the net trade-off value. > > Also, I am a little bit afraid of OWL for the appliation I have in mind, > as it does not, in my (limited) experience, do well in the presence of > messy data. It is easy to formulate inconsistent data, and to infer really > unexpected things. The application - or rather, the protocol - is > completely open (anyone can say anything.. including incomplete or > inconsistent things), so I would prefer to remain on a less expressive > level as much as possible, ideally doing no reasoning at all. > > I've seen SHACL work quite nicely in the browser, but I am not sure about > reasoners for OWL or OWL-DL. Because of the reservations stated earlier, I > must admit, I haven't checked. Doing the blending in the browser may not be > a necessity, but I would not want to close that door just yet. > > Finally, I have to admit, for the DL based approach, I'm not really sure I > would know even where to start. I'll be thankful for any help here and I'll > be happy to cooperate if somone else wants to work on it! > > > I am also thinking about the implications of the blending approach vs the > DL based approach... The goal of blending was to connect two graphs that > each describe one half of the solution, like puzzle pieces. The example I > had in mind was: A says, "A is the passenger of the taxi ride." B says, "B > is the driver of the taxi ride." Join the two structures by blending the > "taxi ride" node and you get "A is the passenger of the taxi ride that B is > the driver of". It seems to me that one can arrive at this result without > special reasoning rules that were designed for exactly this use case. A > simple rdfs vocabulary, some SHACL constraints, and the variable/constant > definition would be enough. All of these could be different in every > interaction, allowing for evolution of 'reasoning' as well as evolution of > the use of vocabularies. The DL approach may be very powerful for > premeditated scenarios, but I am not sure how well it would work if agents > want to interact in ways that were not foreseen by the ontology design. > > The approach we have outlined so far is more of a linked data flavored > approach than a semantic web one. For many simple cases, it may be > sufficient and work just fine. It seems more robust to me (less prone to > fail with messy data) than the DL based approach, and it might be a good > thing to have as a baseline. However, I can imagine that both approaches > will be supported at some point and they can live next to each other: some > agents may prefer to express their goals in line with rdf blending, some > with DL class intersections, some may use both. In all cases, the net > result should be an RDF graph that the partners can agree to. > > Best > Florian > > Am 2017-11-21 03:54, schrieb Paul Tyson: > >> Hi Florian, very interesting problem. See my comments below. >> >> On Mon, 2017-11-20 at 19:09 +0100, Florian Kleedorfer wrote: >> >>> (Note: I changed the subject of this thread following Pat Hayes' >>> suggestion [1]) >>> >>> After some on-list and off-list discussions, the problem is becoming >>> clearer. I'll try to state it again, along with some ideas for a >>> solution. Any comments, criticism, and suggestions to collaborate are >>> highly welcome ;-) >>> >>> # Blending: >>> ## Definition >>> Given two RDF graphs g1 and g2, blending refers to the pairwise renaming >>> of resources (blank nodes or URI nodes) r1 in g1 and r2 in g2 such that >>> one resource name (which may be r1, r2, or any other URI) is used >>> instead of r1 and r2. The nodes r1 and r2 are said to be blended (ideas >>> for a better term, anyone?). The resulting graphs are merged (avoiding >>> accidental blank node identification, see RDF 1.1 Semantics[2]). The >>> blending function b maps the input graphs to the set of all possible >>> output graphs. >>> >> >> I wonder why you do not frame the problem more in terms of description >> logics. "Blending" as you have described it seems to be a much looser >> (flexible?) way of finding the intersection of 2 classes. If the >> provider has defined the classes of goods/services they can provide, and >> the consumer defines the class(es) he wishes to buy, then before they >> can do business they must find an instance of good or service that falls >> in the intersection of those classes. >> >> Suppose the provider defines class P, the consumer class C. Then any >> transaction between them would be in the class defined by >> intersectionOf(P C). If this class is unsatisfiable (cannot have any >> members, for instance if P has :price = 20 and C has :price < 15), you >> want an algorithm to define and rank ways in which the provider's and >> consumer's class definitions could be modified to make the intersection >> satisfiable. These "negotiating rules" (or "distance measurements") are >> the interesting part of the problem. >> >> I do not know if there are implementable solutions to these problems in >> the description logics literature, but it might be worth a little >> searching before building the graph-manipulation approach you have >> outlined here. >> >> Regards, >> --Paul >> >>> >>> ## Discussion >>> There are many ways to blend two graphs, and none of these ways are >>> deemed correct or incorrect. >>> We require this operation to combine two data structures held by >>> mutually independent agents that have certain expectations about the >>> blending solution. These expectations divide the set of possible >>> outcomes at least in acceptable and unacceptable ones, possibly even in >>> a preference ranking. I'll try to formulate that as well in the >>> following. >>> >>> >>> # Node blendability >>> ## Rationale >>> One way of constraining the result is to restrict which nodes can be >>> blended. So far, we have identified three types: variable nodes, >>> constant nodes, and unblendable nodes. Each node in an RDF graph has >>> exactly one of the three types. A variable node may be blended and its >>> name may be changed. A constant node may also be blended, but its name >>> is not allowed to change. An unblendable node must not be blended. >>> >>> One may note that blending g1 and g2 is equivalent to merging them if no >>> variable nodes are present in either graph. >>> >>> It is currently unclear how the constant/variable/unblendable property >>> should be expressed. My intuition is that in most practical use cases, >>> most nodes need to be unblendable (ontology/TBox resources, properties >>> in general), therefore it would be more economical to assume nodes to be >>> unblendable by default. Then, variable/constant nodes have to be >>> annotated as such explicitly. >>> Maybe it would be an option to make definitions by URI prefix, so we can >>> define a complete vocabulary as unblendable with one statement. >>> (opinions on that?) >>> >>> ## Definition >>> Given two graphs g1 and g2, a blendability-aware blending function is a >>> blending function for g1 and g2 that only replaces resources that have >>> not been declared as constant with respect to blending. The latter is >>> done by specifying a triple "<graph> bl:containsConstant <resource> ." >>> or "<graph> bl:containsVariable <resource> ." in an additional graph >>> passed to the blendability-aware blending function. Such a graph is >>> called a 'blendability graph'. >>> >>> >>> # Constraints on the blending solution: >>> ## Rationale >>> As explained earlier, there may be multiple parties that have >>> expectations for the blending solution. So far, we have talked about the >>> originators of the data. There may also be one or more intended >>> consumers of the data. Nothing prevents the expectations from being >>> mutually incompatible (meaning that there exists no graph that satisfies >>> them all) or that they are even unsatisfiable by themselves. It would of >>> course be beneficial if such cases could be detected automaticall, but >>> for now, let's just leave it at noting the possibility. Now, if the >>> information the agents provided, when considered in total, has spurious >>> or conflicting information, or if information is missing (taking their >>> respective expectations as a definition of what is acceptable), the >>> blending solution cannot meet the expectations. Nevertheless, it has to >>> be computed, in order to be able to take the next step and "repair" >>> (better word, please) one or both graphs and try again. >>> >>> It would be nice to use constraints for the following goals during the >>> calculation of the blending solution: >>> 1. pruning wholly unacceptable solutions >>> 2. ranking unacceptable solutions by badness >>> 3. ranking acceptable-looking, but incomplete solutions ("so far so >>> good") >>> >>> Constraints should allow us to identify an acceptable result, or to >>> select the best result of a number of unacceptable ones, and report them >>> to users so that they can repair them. >>> >>> However, if the expectations define an acceptable result, anything else, >>> however close, is unacceptable, and in the absence of acceptable results >>> we are left with unacceptable ones, some of which may be completely >>> bogus while others are almost acceptable. So, the three categories above >>> cannot be separated, and the best we can hope for is reasonable ranking >>> of unacceptable solutions. >>> >>> For the conceptualization, we go with a simple SHACL-based approach. >>> Other systems (SPARQL, ShEx) are definitely possible as well. >>> >>> ## Definition >>> The constraints-aware blending function is a blending function for two >>> RDF graphs g1 and g2 and takes optional graph names s1,...,sN that >>> denote graphs which contain SHACL shapes constraining the result. Each >>> blending solution is validated using s1,...,sN; the result of the >>> function is a mapping of the blending solutions to their respective >>> SHACL validation results, indexed by shape graph name (s1,...,sN). A >>> blending solution is called accepted with respect to a constraints graph >>> s if it causes no SHACL validation results when validated against s. A >>> blending solution is called accepted if it is accepted with respect to >>> all specified shapes graphs. >>> >>> ## Discussion >>> The main issue in this step seems to be the pruning and ordering of >>> unacceptable results. Blending functions may choose to drop blending >>> solutions if they are deemed too bad - but where should they draw the >>> line and still report an unacceptable blending solution while dropping >>> another? >>> >>> With respect to one shapes graph, we could define that validation result >>> graphs are ranked by maximum severity (1 sh:Violation is worse than 1000 >>> sh:Warnings), and within those classes by number of validation results >>> (2 Violations and 1 Warnings is worse than 1 Violation and 1000 >>> Warnings). >>> >>> I'm suspecting that the validation results should allow us to prune at >>> least some bad blending solutions if their validation results are >>> supersets of other solutions' validation results: if solution A has >>> validation results (R1) and solution B has validation result (R1, R2) we >>> can say that solution B is a deterioration of solution A and should not >>> be reported. >>> >>> In addition to constraint-based pruning and ranking, it is of course >>> possible to perform reasoning-based pruning and validate the blending >>> solution according to appropriate reasoning logics (RDFS, OWL, ...). >>> However, I am not sure as to the usefulness of this approach if no >>> solution is found that is consistent according to the logic. >>> >>> >>> # Graph blending algorithms >>> Let's assume for this section that we are blending two graphs g1 and g2, >>> and we know we have variable nodes var(g1) and constant nodes const(g1) >>> in g1, and var(g2) and const(g2) in g2. >>> >>> ## Complete >>> The complete algorithm consists of enumerating all possible solutions >>> For each n in var(g1), there are the following solutions: >>> * 1 solution in which n is not blended at all >>> * 1 solution for each m in var(g2), in which n is blended with m >>> * 1 solution for each m in const(g2), in which n is blended with m >>> symmetrically the same is true for each n in var(g2). >>> Removal of duplicate solutions yields the set of solutions. >>> >>> ## Heuristic >>> We may interpret blending as a heuristic search and apply an A*-like >>> algorithm to finding good results, assessing the distance to the goal by >>> evaluating the constraints, and the distance from the start by the >>> number of node blendings. >>> >>> ## Knowledge-based >>> Exploiting knowledge about the actual structures in g1 and g2, the >>> heuristic approach could be enhanced to apply more appropriate distance >>> estimates and select more promising paths first. >>> >>> >>> >>> Links: >>> 1. https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html >>> 2. https://www.w3.org/TR/rdf11-mt/ >>> >>> >>> >>> >>> Am 2017-11-16 19:37, schrieb Florian Kleedorfer: >>> > Hi, >>> > >>> > As you may recall, I've asked this list about messaging/negotiating >>> > using linked data[1], and it led me and my colleagues to a concept of >>> > RDF-based agreements[2]. (Thanks again for your input!) >>> > >>> > Now it's time for us to define how the negotiating parties come up >>> > with the data (set of rdf triples) that they agree on. >>> > >>> > One Idea we have is that both particpants (P1,P2) specify 'goals', >>> > each consisting of a SHACL shapes graph and a data graph. Two goals >>> > can be tested for compatibility by merging the two data graphs and >>> > then testing if the resulting graph is valid given the shapes defined >>> > by both goals. The problem we face with this approach is merging the >>> > two data graphs. Consider the example in this gist: >>> > >>> > https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332 >>> > >>> > In that example, we would want to merge these two graphs: >>> > >>> > ex1:p1g-data { >>> > ex1:ride1 a taxi:Ride . >>> > ex1:ride1 taxi:hasDriver ex1:p1 . >>> > ex1:ride1 taxi:hasPickupLocation ex1:pickup1; >>> > } >>> > >>> > and >>> > >>> > ex2:p2g-data { >>> > ex2:myRide a taxi:Ride. >>> > ex2:myRide taxi:hasClient ex2:p2 . >>> > ex2:myRide taxi:hasPickupLocation ex2:myPickupLocation . >>> > ex2:myPickupLocation a s:GeoCoordinates ; >>> > s:latitude "48.213814" ; >>> > s:longitude "16.340870" . >>> > } >>> > >>> > We want that merge to happen in such a way that the shapes in >>> > ex1:p1g-shapes and ex2:p2g-shapes can be evaluated, which will require >>> > to merge some of the nodes. (In the example, we'll find out that a >>> > taxi:hasPickupTime property is missing. One of the participants is >>> > then to add that property, and the whole structure becomes valid >>> > according to both goals' shapes.) >>> > >>> > The question is how exactly to achieve that merge. Intuitively, the >>> > result in this case should be >>> > >>> > :mergedNode1 a taxi:Ride . >>> > :mergedNode1 taxi:hasDriver ex1:p1 . # note: p1 links her own >>> > identifier to the structure >>> > :mergedNode1 taxi:hasPickupLocation :mergedNode2; >>> > :mergedNode1 a taxi:Ride. >>> > :mergedNode1 taxi:hasClient ex2:p2 . >>> > :mergedNode1 taxi:hasPickupLocation :mergedNode2 . >>> > :mergedNode2 a s:GeoCoordinates ; >>> > s:latitude "48.213814" ; >>> > s:longitude "16.340870" . >>> > >>> > and after removal of duplicate triples >>> > >>> > :mergedNode1 a taxi:Ride . >>> > :mergedNode1 taxi:hasDriver ex1:p1 . >>> > :mergedNode1 taxi:hasClient ex2:p2 . >>> > :mergedNode1 taxi:hasPickupLocation :mergedNode2 . >>> > :mergedNode2 a s:GeoCoordinates ; >>> > s:latitude "48.213814" ; >>> > s:longitude "16.340870" . >>> > >>> > (I currently don't care about the URIs of the mergedNodes - just >>> > something one can mint when doing the merge) >>> > >>> > >>> > Do you have any hints on how to perform such a merge reliably, also >>> > with more complex structures? >>> > >>> > Thank you very much in advance! >>> > >>> > Best regards, >>> > Florian >>> > >>> > >>> > Links: >>> > 1. Negotiations discussion thread: >>> > https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html >>> > 2. Agreements paper: http://ceur-ws.org/Vol-1934/contribution-07.pdf >>> >>> >
Received on Tuesday, 21 November 2017 13:35:03 UTC