- From: Martynas Jusevičius <martynas@atomgraph.com>
- Date: Tue, 21 Nov 2017 16:02:05 +0100
- To: Florian Kleedorfer <florian.kleedorfer@austria.fm>
- Cc: Paul Tyson <phtyson@sbcglobal.net>, Semantic Web <semantic-web@w3.org>
- Message-ID: <CAE35VmyqRwJjE7W_LEujOrbd7b0OUaRsW0uBxd5sq13rneLz1Q@mail.gmail.com>
Hi again, if you only need to "blend" 2 graphs at a time, with one typed resource in each, I guess you can join on type, again even with SPARQL. Can there be other commonality than type? What if you have multiple resources with the same type? This seems like a very limited use case to me, which does not scale. I was thinking more about a triplestore with thousands or millions of graphs where any number of rides might match drivers to passengers. In that case I don't think joining only on type can work. I'm also quite sure (optimized) SPARQL will be faster than any rules/inference, and probably also custom algorithms, unless you use some specialized storage beyond triples. On Tue, Nov 21, 2017 at 3:48 PM, Florian Kleedorfer < florian.kleedorfer@austria.fm> wrote: > Hi Martynas, > > I am not sure if I let the use case seem too narrow here. The taxi case is > meant as an example, but the two graphs describing the agent's goals could > contain any RDF. The commonality of all use cases is that a goal by itself > is always incomplete and needs another agent's mutually compatible goal for > being completed. > > Is my problem really that I have nothing to join on? In the example, there > are the TBox resources that are shared (e.g., taxi:Ride) that could be used > for joining. I think in the case of the example below, the CONSTRUCT > approach is possible, as suggested and shown by Axel Polleres off-list a > few days ago as a pattern-based solution to this problem. I would classify > it as a knowledge-based 'blending' approach because the target structure as > well as the structure of the two graphs we want to 'blend' need to be known > beforehand, which allows for substantial optimizations. > > A downside the pattern-based approach, I suspect, is that it can only > handle expected cases. Or do you see a way to generate such a SPARQL > statement from the data ad-hoc? > > As a side-note, I am not so sure if I would refer to SPARQL as 'simple'. > It's easy to write a simple SPARQL query, but it requires resources to > parse and execute it, and it may be more efficient to reach the same result > with a different algorithm, more tailored to the problem. > > > > Am 2017-11-21 14:34, schrieb Martynas Jusevičius: > >> If understand your use case right, your problem is that you have >> nothing to join on, i.e. no common nodes, between the 2 rides. >> >> If there was something in common, I think it could be solved with a >> simple SPARQL CONSTRUCT. Now what that should be, depends on your use >> case. >> >> Maybe geohash of the location? https://en.wikipedia.org/wiki/Geohash >> >> In any case, I don't think it requires some kind of new graph >> "blending" algorithm. >> > > I would be very relieved if I could take something off the shelf here. > > Best regards, > Florian > > > > >> On Tue, Nov 21, 2017 at 12:40 PM, Florian Kleedorfer >> <florian.kleedorfer@austria.fm> wrote: >> >> Hi Paul, >>> >>> I received a similar hint yesterday off-list. I am really grateful >>> for these suggestions, as I had not seen the possibility at all. It >>> seems we should really look into it. >>> >>> You asked why I had not framed the problem more in terms of >>> description logics. The main reason is: I had not even considered >>> it, really. Here are some possible causes for that: >>> >>> One thing to say about description logics is that it is one more >>> thing to learn, which makes it that much more unlikely a programmer >>> will use description logics over something else that gets the job >>> done. The graph blending approach is simpler in terms of logics but >>> you have to learn SHACL before you can use it - not sure about the >>> net trade-off value. >>> >>> Also, I am a little bit afraid of OWL for the appliation I have in >>> mind, as it does not, in my (limited) experience, do well in the >>> presence of messy data. It is easy to formulate inconsistent data, >>> and to infer really unexpected things. The application - or rather, >>> the protocol - is completely open (anyone can say anything.. >>> including incomplete or inconsistent things), so I would prefer to >>> remain on a less expressive level as much as possible, ideally doing >>> no reasoning at all. >>> >>> I've seen SHACL work quite nicely in the browser, but I am not sure >>> about reasoners for OWL or OWL-DL. Because of the reservations >>> stated earlier, I must admit, I haven't checked. Doing the blending >>> in the browser may not be a necessity, but I would not want to close >>> that door just yet. >>> >>> Finally, I have to admit, for the DL based approach, I'm not really >>> sure I would know even where to start. I'll be thankful for any help >>> here and I'll be happy to cooperate if somone else wants to work on >>> it! >>> >>> I am also thinking about the implications of the blending approach >>> vs the DL based approach... The goal of blending was to connect two >>> graphs that each describe one half of the solution, like puzzle >>> pieces. The example I had in mind was: A says, "A is the passenger >>> of the taxi ride." B says, "B is the driver of the taxi ride." Join >>> the two structures by blending the "taxi ride" node and you get "A >>> is the passenger of the taxi ride that B is the driver of". It seems >>> to me that one can arrive at this result without special reasoning >>> rules that were designed for exactly this use case. A simple rdfs >>> vocabulary, some SHACL constraints, and the variable/constant >>> definition would be enough. All of these could be different in every >>> interaction, allowing for evolution of 'reasoning' as well as >>> evolution of the use of vocabularies. The DL approach may be very >>> powerful for premeditated scenarios, but I am not sure how well it >>> would work if agents want to interact in ways that were not foreseen >>> by the ontology design. >>> >>> The approach we have outlined so far is more of a linked data >>> flavored approach than a semantic web one. For many simple cases, it >>> may be sufficient and work just fine. It seems more robust to me >>> (less prone to fail with messy data) than the DL based approach, and >>> it might be a good thing to have as a baseline. However, I can >>> imagine that both approaches will be supported at some point and >>> they can live next to each other: some agents may prefer to express >>> their goals in line with rdf blending, some with DL class >>> intersections, some may use both. In all cases, the net result >>> should be an RDF graph that the partners can agree to. >>> >>> Best >>> Florian >>> >>> Am 2017-11-21 03:54, schrieb Paul Tyson: >>> Hi Florian, very interesting problem. See my comments below. >>> >>> On Mon, 2017-11-20 at 19:09 +0100, Florian Kleedorfer wrote: >>> (Note: I changed the subject of this thread following Pat Hayes' >>> suggestion [1]) >>> >>> After some on-list and off-list discussions, the problem is becoming >>> clearer. I'll try to state it again, along with some ideas for a >>> solution. Any comments, criticism, and suggestions to collaborate >>> are >>> highly welcome ;-) >>> >>> # Blending: >>> ## Definition >>> Given two RDF graphs g1 and g2, blending refers to the pairwise >>> renaming >>> of resources (blank nodes or URI nodes) r1 in g1 and r2 in g2 such >>> that >>> one resource name (which may be r1, r2, or any other URI) is used >>> instead of r1 and r2. The nodes r1 and r2 are said to be blended >>> (ideas >>> for a better term, anyone?). The resulting graphs are merged >>> (avoiding >>> accidental blank node identification, see RDF 1.1 Semantics[2]). The >>> blending function b maps the input graphs to the set of all possible >>> output graphs. >>> >>> I wonder why you do not frame the problem more in terms of >>> description >>> logics. "Blending" as you have described it seems to be a much >>> looser >>> (flexible?) way of finding the intersection of 2 classes. If the >>> provider has defined the classes of goods/services they can provide, >>> and >>> the consumer defines the class(es) he wishes to buy, then before >>> they >>> can do business they must find an instance of good or service that >>> falls >>> in the intersection of those classes. >>> >>> Suppose the provider defines class P, the consumer class C. Then any >>> transaction between them would be in the class defined by >>> intersectionOf(P C). If this class is unsatisfiable (cannot have any >>> members, for instance if P has :price = 20 and C has :price < 15), >>> you >>> want an algorithm to define and rank ways in which the provider's >>> and >>> consumer's class definitions could be modified to make the >>> intersection >>> satisfiable. These "negotiating rules" (or "distance measurements") >>> are >>> the interesting part of the problem. >>> >>> I do not know if there are implementable solutions to these problems >>> in >>> the description logics literature, but it might be worth a little >>> searching before building the graph-manipulation approach you have >>> outlined here. >>> >>> Regards, >>> --Paul >>> >>> ## Discussion >>> There are many ways to blend two graphs, and none of these ways are >>> deemed correct or incorrect. >>> We require this operation to combine two data structures held by >>> mutually independent agents that have certain expectations about the >>> blending solution. These expectations divide the set of possible >>> outcomes at least in acceptable and unacceptable ones, possibly even >>> in >>> a preference ranking. I'll try to formulate that as well in the >>> following. >>> >>> # Node blendability >>> ## Rationale >>> One way of constraining the result is to restrict which nodes can be >>> blended. So far, we have identified three types: variable nodes, >>> constant nodes, and unblendable nodes. Each node in an RDF graph has >>> exactly one of the three types. A variable node may be blended and >>> its >>> name may be changed. A constant node may also be blended, but its >>> name >>> is not allowed to change. An unblendable node must not be blended. >>> >>> One may note that blending g1 and g2 is equivalent to merging them >>> if no >>> variable nodes are present in either graph. >>> >>> It is currently unclear how the constant/variable/unblendable >>> property >>> should be expressed. My intuition is that in most practical use >>> cases, >>> most nodes need to be unblendable (ontology/TBox resources, >>> properties >>> in general), therefore it would be more economical to assume nodes >>> to be >>> unblendable by default. Then, variable/constant nodes have to be >>> annotated as such explicitly. >>> Maybe it would be an option to make definitions by URI prefix, so we >>> can >>> define a complete vocabulary as unblendable with one statement. >>> (opinions on that?) >>> >>> ## Definition >>> Given two graphs g1 and g2, a blendability-aware blending function >>> is a >>> blending function for g1 and g2 that only replaces resources that >>> have >>> not been declared as constant with respect to blending. The latter >>> is >>> done by specifying a triple "<graph> bl:containsConstant <resource> >>> ." >>> or "<graph> bl:containsVariable <resource> ." in an additional graph >>> passed to the blendability-aware blending function. Such a graph is >>> called a 'blendability graph'. >>> >>> # Constraints on the blending solution: >>> ## Rationale >>> As explained earlier, there may be multiple parties that have >>> expectations for the blending solution. So far, we have talked about >>> the >>> originators of the data. There may also be one or more intended >>> consumers of the data. Nothing prevents the expectations from being >>> mutually incompatible (meaning that there exists no graph that >>> satisfies >>> them all) or that they are even unsatisfiable by themselves. It >>> would of >>> course be beneficial if such cases could be detected automaticall, >>> but >>> for now, let's just leave it at noting the possibility. Now, if the >>> information the agents provided, when considered in total, has >>> spurious >>> or conflicting information, or if information is missing (taking >>> their >>> respective expectations as a definition of what is acceptable), the >>> blending solution cannot meet the expectations. Nevertheless, it has >>> to >>> be computed, in order to be able to take the next step and "repair" >>> (better word, please) one or both graphs and try again. >>> >>> It would be nice to use constraints for the following goals during >>> the >>> calculation of the blending solution: >>> 1. pruning wholly unacceptable solutions >>> 2. ranking unacceptable solutions by badness >>> 3. ranking acceptable-looking, but incomplete solutions ("so far so >>> good") >>> >>> Constraints should allow us to identify an acceptable result, or to >>> select the best result of a number of unacceptable ones, and report >>> them >>> to users so that they can repair them. >>> >>> However, if the expectations define an acceptable result, anything >>> else, >>> however close, is unacceptable, and in the absence of acceptable >>> results >>> we are left with unacceptable ones, some of which may be completely >>> bogus while others are almost acceptable. So, the three categories >>> above >>> cannot be separated, and the best we can hope for is reasonable >>> ranking >>> of unacceptable solutions. >>> >>> For the conceptualization, we go with a simple SHACL-based approach. >>> Other systems (SPARQL, ShEx) are definitely possible as well. >>> >>> ## Definition >>> The constraints-aware blending function is a blending function for >>> two >>> RDF graphs g1 and g2 and takes optional graph names s1,...,sN that >>> denote graphs which contain SHACL shapes constraining the result. >>> Each >>> blending solution is validated using s1,...,sN; the result of the >>> function is a mapping of the blending solutions to their respective >>> SHACL validation results, indexed by shape graph name (s1,...,sN). A >>> blending solution is called accepted with respect to a constraints >>> graph >>> s if it causes no SHACL validation results when validated against s. >>> A >>> blending solution is called accepted if it is accepted with respect >>> to >>> all specified shapes graphs. >>> >>> ## Discussion >>> The main issue in this step seems to be the pruning and ordering of >>> unacceptable results. Blending functions may choose to drop blending >>> solutions if they are deemed too bad - but where should they draw >>> the >>> line and still report an unacceptable blending solution while >>> dropping >>> another? >>> >>> With respect to one shapes graph, we could define that validation >>> result >>> graphs are ranked by maximum severity (1 sh:Violation is worse than >>> 1000 >>> sh:Warnings), and within those classes by number of validation >>> results >>> (2 Violations and 1 Warnings is worse than 1 Violation and 1000 >>> Warnings). >>> >>> I'm suspecting that the validation results should allow us to prune >>> at >>> least some bad blending solutions if their validation results are >>> supersets of other solutions' validation results: if solution A has >>> validation results (R1) and solution B has validation result (R1, >>> R2) we >>> can say that solution B is a deterioration of solution A and should >>> not >>> be reported. >>> >>> In addition to constraint-based pruning and ranking, it is of course >>> possible to perform reasoning-based pruning and validate the >>> blending >>> solution according to appropriate reasoning logics (RDFS, OWL, ...). >>> However, I am not sure as to the usefulness of this approach if no >>> solution is found that is consistent according to the logic. >>> >>> # Graph blending algorithms >>> Let's assume for this section that we are blending two graphs g1 and >>> g2, >>> and we know we have variable nodes var(g1) and constant nodes >>> const(g1) >>> in g1, and var(g2) and const(g2) in g2. >>> >>> ## Complete >>> The complete algorithm consists of enumerating all possible >>> solutions >>> For each n in var(g1), there are the following solutions: >>> * 1 solution in which n is not blended at all >>> * 1 solution for each m in var(g2), in which n is blended with m >>> * 1 solution for each m in const(g2), in which n is blended with m >>> symmetrically the same is true for each n in var(g2). >>> Removal of duplicate solutions yields the set of solutions. >>> >>> ## Heuristic >>> We may interpret blending as a heuristic search and apply an A*-like >>> algorithm to finding good results, assessing the distance to the >>> goal by >>> evaluating the constraints, and the distance from the start by the >>> number of node blendings. >>> >>> ## Knowledge-based >>> Exploiting knowledge about the actual structures in g1 and g2, the >>> heuristic approach could be enhanced to apply more appropriate >>> distance >>> estimates and select more promising paths first. >>> >>> Links: >>> 1. >>> https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html >>> [1] >>> 2. https://www.w3.org/TR/rdf11-mt/ [2] >>> >>> Am 2017-11-16 19:37, schrieb Florian Kleedorfer: >>> >>>> Hi, >>>> >>>> As you may recall, I've asked this list about >>>> >>> messaging/negotiating >>> >>>> using linked data[1], and it led me and my colleagues to a concept >>>> >>> of >>> >>>> RDF-based agreements[2]. (Thanks again for your input!) >>>> >>>> Now it's time for us to define how the negotiating parties come up >>>> with the data (set of rdf triples) that they agree on. >>>> >>>> One Idea we have is that both particpants (P1,P2) specify 'goals', >>>> each consisting of a SHACL shapes graph and a data graph. Two >>>> >>> goals >>> >>>> can be tested for compatibility by merging the two data graphs and >>>> then testing if the resulting graph is valid given the shapes >>>> >>> defined >>> >>>> by both goals. The problem we face with this approach is merging >>>> >>> the >>> >>>> two data graphs. Consider the example in this gist: >>>> >>>> >>>> https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332 >>> [3] >>> >>> >>>> In that example, we would want to merge these two graphs: >>>> >>>> ex1:p1g-data { >>>> ex1:ride1 a taxi:Ride . >>>> ex1:ride1 taxi:hasDriver ex1:p1 . >>>> ex1:ride1 taxi:hasPickupLocation ex1:pickup1; >>>> } >>>> >>>> and >>>> >>>> ex2:p2g-data { >>>> ex2:myRide a taxi:Ride. >>>> ex2:myRide taxi:hasClient ex2:p2 . >>>> ex2:myRide taxi:hasPickupLocation ex2:myPickupLocation . >>>> ex2:myPickupLocation a s:GeoCoordinates ; >>>> s:latitude "48.213814" ; >>>> s:longitude "16.340870" . >>>> } >>>> >>>> We want that merge to happen in such a way that the shapes in >>>> ex1:p1g-shapes and ex2:p2g-shapes can be evaluated, which will >>>> >>> require >>> >>>> to merge some of the nodes. (In the example, we'll find out that a >>>> taxi:hasPickupTime property is missing. One of the participants is >>>> then to add that property, and the whole structure becomes valid >>>> according to both goals' shapes.) >>>> >>>> The question is how exactly to achieve that merge. Intuitively, >>>> >>> the >>> >>>> result in this case should be >>>> >>>> :mergedNode1 a taxi:Ride . >>>> :mergedNode1 taxi:hasDriver ex1:p1 . # note: p1 links her own >>>> identifier to the structure >>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2; >>>> :mergedNode1 a taxi:Ride. >>>> :mergedNode1 taxi:hasClient ex2:p2 . >>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2 . >>>> :mergedNode2 a s:GeoCoordinates ; >>>> s:latitude "48.213814" ; >>>> s:longitude "16.340870" . >>>> >>>> and after removal of duplicate triples >>>> >>>> :mergedNode1 a taxi:Ride . >>>> :mergedNode1 taxi:hasDriver ex1:p1 . >>>> :mergedNode1 taxi:hasClient ex2:p2 . >>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2 . >>>> :mergedNode2 a s:GeoCoordinates ; >>>> s:latitude "48.213814" ; >>>> s:longitude "16.340870" . >>>> >>>> (I currently don't care about the URIs of the mergedNodes - just >>>> something one can mint when doing the merge) >>>> >>>> >>>> Do you have any hints on how to perform such a merge reliably, >>>> >>> also >>> >>>> with more complex structures? >>>> >>>> Thank you very much in advance! >>>> >>>> Best regards, >>>> Florian >>>> >>>> >>>> Links: >>>> 1. Negotiations discussion thread: >>>> >>>> https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html >>> [4] >>> >>>> 2. Agreements paper: >>>> >>> http://ceur-ws.org/Vol-1934/contribution-07.pdf [5] >>> >> >> >> >> Links: >> ------ >> [1] https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html >> [2] https://www.w3.org/TR/rdf11-mt/ >> [3] https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332 >> [4] https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html >> [5] http://ceur-ws.org/Vol-1934/contribution-07.pdf >> >
Received on Tuesday, 21 November 2017 15:02:58 UTC