Re: Blending RDF graphs

Hi Martynas,

I am not sure if I let the use case seem too narrow here. The taxi case 
is meant as an example, but the two graphs describing the agent's goals 
could contain any RDF. The commonality of all use cases is that a goal 
by itself is always incomplete and needs another agent's mutually 
compatible goal for being completed.

Is my problem really that I have nothing to join on? In the example, 
there are the TBox resources that are shared (e.g., taxi:Ride) that 
could be used for joining. I think in the case of the example below, the 
CONSTRUCT approach is possible, as suggested and shown by Axel Polleres 
off-list a few days ago as a pattern-based solution to this problem. I 
would classify it as a knowledge-based 'blending' approach because the 
target structure as well as the structure of the two graphs we want to 
'blend' need to be known beforehand, which allows for substantial 
optimizations.

A downside the pattern-based approach, I suspect, is that it can only 
handle expected cases. Or do you see a way to generate such a SPARQL 
statement from the data ad-hoc?

As a side-note, I am not so sure if I would refer to SPARQL as 'simple'. 
It's easy to write a simple SPARQL query, but it requires resources to 
parse and execute it, and it may be more efficient to reach the same 
result with a different algorithm, more tailored to the problem.



Am 2017-11-21 14:34, schrieb Martynas Jusevičius:
> If understand your use case right, your problem is that you have
> nothing to join on, i.e. no common nodes, between the 2 rides.
> 
> If there was something in common, I think it could be solved with a
> simple SPARQL CONSTRUCT. Now what that should be, depends on your use
> case.
> 
> Maybe geohash of the location? https://en.wikipedia.org/wiki/Geohash
> 
> In any case, I don't think it requires some kind of new graph
> "blending" algorithm.

I would be very relieved if I could take something off the shelf here.

Best regards,
Florian



> 
> On Tue, Nov 21, 2017 at 12:40 PM, Florian Kleedorfer
> <florian.kleedorfer@austria.fm> wrote:
> 
>> Hi Paul,
>> 
>> I received a similar hint yesterday off-list. I am really grateful
>> for these suggestions, as I had not seen the possibility at all. It
>> seems we should really look into it.
>> 
>> You asked why I had not framed the problem more in terms of
>> description logics. The main reason is: I had not even considered
>> it, really. Here are some possible causes for that:
>> 
>> One thing to say about description logics is that it is one more
>> thing to learn, which makes it that much more unlikely a programmer
>> will use description logics over something else that gets the job
>> done. The graph blending approach is simpler in terms of logics but
>> you have to learn SHACL before you can use it - not sure about the
>> net trade-off value.
>> 
>> Also, I am a little bit afraid of OWL for the appliation I have in
>> mind, as it does not, in my (limited) experience, do well in the
>> presence of messy data. It is easy to formulate inconsistent data,
>> and to infer really unexpected things. The application - or rather,
>> the protocol - is completely open (anyone can say anything..
>> including incomplete or inconsistent things), so I would prefer to
>> remain on a less expressive level as much as possible, ideally doing
>> no reasoning at all.
>> 
>> I've seen SHACL work quite nicely in the browser, but I am not sure
>> about reasoners for OWL or OWL-DL. Because of the reservations
>> stated earlier, I must admit, I haven't checked. Doing the blending
>> in the browser may not be a necessity, but I would not want to close
>> that door just yet.
>> 
>> Finally, I have to admit, for the DL based approach, I'm not really
>> sure I would know even where to start. I'll be thankful for any help
>> here and I'll be happy to cooperate if somone else wants to work on
>> it!
>> 
>> I am also thinking about the implications of the blending approach
>> vs the DL based approach... The goal of blending was to connect two
>> graphs that each describe one half of the solution, like puzzle
>> pieces. The example I had in mind was: A says, "A is the passenger
>> of the taxi ride." B says, "B is the driver of the taxi ride." Join
>> the two structures by blending the "taxi ride" node and you get "A
>> is the passenger of the taxi ride that B is the driver of". It seems
>> to me that one can arrive at this result without special reasoning
>> rules that were designed for exactly this use case. A simple rdfs
>> vocabulary, some SHACL constraints, and the variable/constant
>> definition would be enough. All of these could be different in every
>> interaction, allowing for evolution of 'reasoning' as well as
>> evolution of the use of vocabularies. The DL approach may be very
>> powerful for premeditated scenarios, but I am not sure how well it
>> would work if agents want to interact in ways that were not foreseen
>> by the ontology design.
>> 
>> The approach we have outlined so far is more of a linked data
>> flavored approach than a semantic web one. For many simple cases, it
>> may be sufficient and work just fine. It seems more robust to me
>> (less prone to fail with messy data) than the DL based approach, and
>> it might be a good thing to have as a baseline. However, I can
>> imagine that both approaches will be supported at some point and
>> they can live next to each other: some agents may prefer to express
>> their goals in line with rdf blending, some with DL class
>> intersections, some may use both. In all cases, the net result
>> should be an RDF graph that the partners can agree to.
>> 
>> Best
>> Florian
>> 
>> Am 2017-11-21 03:54, schrieb Paul Tyson:
>> Hi Florian, very interesting problem. See my comments below.
>> 
>> On Mon, 2017-11-20 at 19:09 +0100, Florian Kleedorfer wrote:
>> (Note: I changed the subject of this thread following Pat Hayes'
>> suggestion [1])
>> 
>> After some on-list and off-list discussions, the problem is becoming
>> clearer. I'll try to state it again, along with some ideas for a
>> solution. Any comments, criticism, and suggestions to collaborate
>> are
>> highly welcome ;-)
>> 
>> # Blending:
>> ## Definition
>> Given two RDF graphs g1 and g2, blending refers to the pairwise
>> renaming
>> of resources (blank nodes or URI nodes) r1 in g1 and r2 in g2 such
>> that
>> one resource name (which may be r1, r2, or any other URI) is used
>> instead of r1 and r2. The nodes r1 and r2 are said to be blended
>> (ideas
>> for a better term, anyone?). The resulting graphs are merged
>> (avoiding
>> accidental blank node identification, see RDF 1.1 Semantics[2]). The
>> blending function b maps the input graphs to the set of all possible
>> output graphs.
>> 
>> I wonder why you do not frame the problem more in terms of
>> description
>> logics. "Blending" as you have described it seems to be a much
>> looser
>> (flexible?) way of finding the intersection of 2 classes. If the
>> provider has defined the classes of goods/services they can provide,
>> and
>> the consumer defines the class(es) he wishes to buy, then before
>> they
>> can do business they must find an instance of good or service that
>> falls
>> in the intersection of those classes.
>> 
>> Suppose the provider defines class P, the consumer class C. Then any
>> transaction between them would be in the class defined by
>> intersectionOf(P C). If this class is unsatisfiable (cannot have any
>> members, for instance if P has :price = 20 and C has :price < 15),
>> you
>> want an algorithm to define and rank ways in which the provider's
>> and
>> consumer's class definitions could be modified to make the
>> intersection
>> satisfiable. These "negotiating rules" (or "distance measurements")
>> are
>> the interesting part of the problem.
>> 
>> I do not know if there are implementable solutions to these problems
>> in
>> the description logics literature, but it might be worth a little
>> searching before building the graph-manipulation approach you have
>> outlined here.
>> 
>> Regards,
>> --Paul
>> 
>> ## Discussion
>> There are many ways to blend two graphs, and none of these ways are
>> deemed correct or incorrect.
>> We require this operation to combine two data structures held by
>> mutually independent agents that have certain expectations about the
>> blending solution. These expectations divide the set of possible
>> outcomes at least in acceptable and unacceptable ones, possibly even
>> in
>> a preference ranking. I'll try to formulate that as well in the
>> following.
>> 
>> # Node blendability
>> ## Rationale
>> One way of constraining the result is to restrict which nodes can be
>> blended. So far, we have identified three types: variable nodes,
>> constant nodes, and unblendable nodes. Each node in an RDF graph has
>> exactly one of the three types. A variable node may be blended and
>> its
>> name may be changed. A constant node may also be blended, but its
>> name
>> is not allowed to change. An unblendable node must not be blended.
>> 
>> One may note that blending g1 and g2 is equivalent to merging them
>> if no
>> variable nodes are present in either graph.
>> 
>> It is currently unclear how the constant/variable/unblendable
>> property
>> should be expressed. My intuition is that in most practical use
>> cases,
>> most nodes need to be unblendable (ontology/TBox resources,
>> properties
>> in general), therefore it would be more economical to assume nodes
>> to be
>> unblendable by default. Then, variable/constant nodes have to be
>> annotated as such explicitly.
>> Maybe it would be an option to make definitions by URI prefix, so we
>> can
>> define a complete vocabulary as unblendable with one statement.
>> (opinions on that?)
>> 
>> ## Definition
>> Given two graphs g1 and g2, a blendability-aware blending function
>> is a
>> blending function for g1 and g2 that only replaces resources that
>> have
>> not been declared as constant with respect to blending. The latter
>> is
>> done by specifying a triple "<graph> bl:containsConstant <resource>
>> ."
>> or "<graph> bl:containsVariable <resource> ." in an additional graph
>> passed to the blendability-aware blending function. Such a graph is
>> called a 'blendability graph'.
>> 
>> # Constraints on the blending solution:
>> ## Rationale
>> As explained earlier, there may be multiple parties that have
>> expectations for the blending solution. So far, we have talked about
>> the
>> originators of the data. There may also be one or more intended
>> consumers of the data. Nothing prevents the expectations from being
>> mutually incompatible (meaning that there exists no graph that
>> satisfies
>> them all) or that they are even unsatisfiable by themselves. It
>> would of
>> course be beneficial if such cases could be detected automaticall,
>> but
>> for now, let's just leave it at noting the possibility. Now, if the
>> information the agents provided, when considered in total, has
>> spurious
>> or conflicting information, or if information is missing (taking
>> their
>> respective expectations as a definition of what is acceptable), the
>> blending solution cannot meet the expectations. Nevertheless, it has
>> to
>> be computed, in order to be able to take the next step and "repair"
>> (better word, please) one or both graphs and try again.
>> 
>> It would be nice to use constraints for the following goals during
>> the
>> calculation of the blending solution:
>> 1. pruning wholly unacceptable solutions
>> 2. ranking unacceptable solutions by badness
>> 3. ranking acceptable-looking, but incomplete solutions ("so far so
>> good")
>> 
>> Constraints should allow us to identify an acceptable result, or to
>> select the best result of a number of unacceptable ones, and report
>> them
>> to users so that they can repair them.
>> 
>> However, if the expectations define an acceptable result, anything
>> else,
>> however close, is unacceptable, and in the absence of acceptable
>> results
>> we are left with unacceptable ones, some of which may be completely
>> bogus while others are almost acceptable. So, the three categories
>> above
>> cannot be separated, and the best we can hope for is reasonable
>> ranking
>> of unacceptable solutions.
>> 
>> For the conceptualization, we go with a simple SHACL-based approach.
>> Other systems (SPARQL, ShEx) are definitely possible as well.
>> 
>> ## Definition
>> The constraints-aware blending function is a blending function for
>> two
>> RDF graphs g1 and g2 and takes optional graph names s1,...,sN that
>> denote graphs which contain SHACL shapes constraining the result.
>> Each
>> blending solution is validated using s1,...,sN; the result of the
>> function is a mapping of the blending solutions to their respective
>> SHACL validation results, indexed by shape graph name (s1,...,sN). A
>> blending solution is called accepted with respect to a constraints
>> graph
>> s if it causes no SHACL validation results when validated against s.
>> A
>> blending solution is called accepted if it is accepted with respect
>> to
>> all specified shapes graphs.
>> 
>> ## Discussion
>> The main issue in this step seems to be the pruning and ordering of
>> unacceptable results. Blending functions may choose to drop blending
>> solutions if they are deemed too bad - but where should they draw
>> the
>> line and still report an unacceptable blending solution while
>> dropping
>> another?
>> 
>> With respect to one shapes graph, we could define that validation
>> result
>> graphs are ranked by maximum severity (1 sh:Violation is worse than
>> 1000
>> sh:Warnings), and within those classes by number of validation
>> results
>> (2 Violations and 1 Warnings is worse than 1 Violation and 1000
>> Warnings).
>> 
>> I'm suspecting that the validation results should allow us to prune
>> at
>> least some bad blending solutions if their validation results are
>> supersets of other solutions' validation results: if solution A has
>> validation results (R1) and solution B has validation result (R1,
>> R2) we
>> can say that solution B is a deterioration of solution A and should
>> not
>> be reported.
>> 
>> In addition to constraint-based pruning and ranking, it is of course
>> possible to perform reasoning-based pruning and validate the
>> blending
>> solution according to appropriate reasoning logics (RDFS, OWL, ...).
>> However, I am not sure as to the usefulness of this approach if no
>> solution is found that is consistent according to the logic.
>> 
>> # Graph blending algorithms
>> Let's assume for this section that we are blending two graphs g1 and
>> g2,
>> and we know we have variable nodes var(g1) and constant nodes
>> const(g1)
>> in g1, and var(g2) and const(g2) in g2.
>> 
>> ## Complete
>> The complete algorithm consists of enumerating all possible
>> solutions
>> For each n in var(g1), there are the following solutions:
>> * 1 solution in which n is not blended at all
>> * 1 solution for each m in var(g2), in which n is blended with m
>> * 1 solution for each m in const(g2), in which n is blended with m
>> symmetrically the same is true for each n in var(g2).
>> Removal of duplicate solutions yields the set of solutions.
>> 
>> ## Heuristic
>> We may interpret blending as a heuristic search and apply an A*-like
>> algorithm to finding good results, assessing the distance to the
>> goal by
>> evaluating the constraints, and the distance from the start by the
>> number of node blendings.
>> 
>> ## Knowledge-based
>> Exploiting knowledge about the actual structures in g1 and g2, the
>> heuristic approach could be enhanced to apply more appropriate
>> distance
>> estimates and select more promising paths first.
>> 
>> Links:
>> 1.
>> https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html
>> [1]
>> 2. https://www.w3.org/TR/rdf11-mt/ [2]
>> 
>> Am 2017-11-16 19:37, schrieb Florian Kleedorfer:
>>> Hi,
>>> 
>>> As you may recall, I've asked this list about
>> messaging/negotiating
>>> using linked data[1], and it led me and my colleagues to a concept
>> of
>>> RDF-based agreements[2]. (Thanks again for your input!)
>>> 
>>> Now it's time for us to define how the negotiating parties come up
>>> with the data (set of rdf triples) that they agree on.
>>> 
>>> One Idea we have is that both particpants (P1,P2) specify 'goals',
>>> each consisting of a SHACL shapes graph and a data graph. Two
>> goals
>>> can be tested for compatibility by merging the two data graphs and
>>> then testing if the resulting graph is valid given the shapes
>> defined
>>> by both goals. The problem we face with this approach is merging
>> the
>>> two data graphs. Consider the example in this gist:
>>> 
>>> 
>> https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332
>> [3]
>>> 
>>> In that example, we would want to merge these two graphs:
>>> 
>>> ex1:p1g-data {
>>> ex1:ride1 a taxi:Ride .
>>> ex1:ride1 taxi:hasDriver ex1:p1 .
>>> ex1:ride1 taxi:hasPickupLocation ex1:pickup1;
>>> }
>>> 
>>> and
>>> 
>>> ex2:p2g-data {
>>> ex2:myRide a taxi:Ride.
>>> ex2:myRide taxi:hasClient ex2:p2 .
>>> ex2:myRide taxi:hasPickupLocation ex2:myPickupLocation .
>>> ex2:myPickupLocation a s:GeoCoordinates ;
>>> s:latitude   "48.213814" ;
>>> s:longitude  "16.340870" .
>>> }
>>> 
>>> We want that merge to happen in such a way that the shapes in
>>> ex1:p1g-shapes and ex2:p2g-shapes can be evaluated, which will
>> require
>>> to merge some of the nodes. (In the example, we'll find out that a
>>> taxi:hasPickupTime property is missing. One of the participants is
>>> then to add that property, and the whole structure becomes valid
>>> according to both goals' shapes.)
>>> 
>>> The question is how exactly to achieve that merge. Intuitively,
>> the
>>> result in this case should be
>>> 
>>> :mergedNode1 a taxi:Ride .
>>> :mergedNode1 taxi:hasDriver ex1:p1 . # note: p1 links her own
>>> identifier to the structure
>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2;
>>> :mergedNode1 a taxi:Ride.
>>> :mergedNode1 taxi:hasClient ex2:p2 .
>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2 .
>>> :mergedNode2 a s:GeoCoordinates ;
>>> s:latitude   "48.213814" ;
>>> s:longitude  "16.340870" .
>>> 
>>> and after removal of duplicate triples
>>> 
>>> :mergedNode1 a taxi:Ride .
>>> :mergedNode1 taxi:hasDriver ex1:p1 .
>>> :mergedNode1 taxi:hasClient ex2:p2 .
>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2 .
>>> :mergedNode2 a s:GeoCoordinates ;
>>> s:latitude   "48.213814" ;
>>> s:longitude  "16.340870" .
>>> 
>>> (I currently don't care about the URIs of the mergedNodes - just
>>> something one can mint when doing the merge)
>>> 
>>> 
>>> Do you have any hints on how to perform such a merge reliably,
>> also
>>> with more complex structures?
>>> 
>>> Thank you very much in advance!
>>> 
>>> Best regards,
>>> Florian
>>> 
>>> 
>>> Links:
>>> 1. Negotiations discussion thread:
>>> 
>> https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html
>> [4]
>>> 2. Agreements paper:
>> http://ceur-ws.org/Vol-1934/contribution-07.pdf [5]
> 
> 
> 
> Links:
> ------
> [1] https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html
> [2] https://www.w3.org/TR/rdf11-mt/
> [3] 
> https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332
> [4] https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html
> [5] http://ceur-ws.org/Vol-1934/contribution-07.pdf

Received on Tuesday, 21 November 2017 14:48:56 UTC