Re: Blending RDF graphs from Martynas Jusevičius on 2017-11-21 (semantic-web@w3.org from November 2017)

From: Martynas Jusevičius <martynas@atomgraph.com>
Date: Tue, 21 Nov 2017 14:34:30 +0100
To: Florian Kleedorfer <florian.kleedorfer@austria.fm>
Cc: Paul Tyson <phtyson@sbcglobal.net>, Semantic Web <semantic-web@w3.org>
Message-ID: <CAE35VmzfWbaubnhQScGV50gnYsAVv=XJByMzjBzrX8ZwvSMVCg@mail.gmail.com>
If understand your use case right, your problem is that you have nothing to
join on, i.e. no common nodes, between the 2 rides.

If there was something in common, I think it could be solved with a simple
SPARQL CONSTRUCT. Now what that should be, depends on your use case.

Maybe geohash of the location? https://en.wikipedia.org/wiki/Geohash

In any case, I don't think it requires some kind of new graph "blending"
algorithm.

On Tue, Nov 21, 2017 at 12:40 PM, Florian Kleedorfer <
florian.kleedorfer@austria.fm> wrote:

> Hi Paul,
>
> I received a similar hint yesterday off-list. I am really grateful for
> these suggestions, as I had not seen the possibility at all. It seems we
> should really look into it.
>
> You asked why I had not framed the problem more in terms of description
> logics. The main reason is: I had not even considered it, really. Here are
> some possible causes for that:
>
> One thing to say about description logics is that it is one more thing to
> learn, which makes it that much more unlikely a programmer will use
> description logics over something else that gets the job done. The graph
> blending approach is simpler in terms of logics but you have to learn SHACL
> before you can use it - not sure about the net trade-off value.
>
> Also, I am a little bit afraid of OWL for the appliation I have in mind,
> as it does not, in my (limited) experience, do well in the presence of
> messy data. It is easy to formulate inconsistent data, and to infer really
> unexpected things. The application - or rather, the protocol - is
> completely open (anyone can say anything.. including incomplete or
> inconsistent things), so I would prefer to remain on a less expressive
> level as much as possible, ideally doing no reasoning at all.
>
> I've seen SHACL work quite nicely in the browser, but I am not sure about
> reasoners for OWL or OWL-DL. Because of the reservations stated earlier, I
> must admit, I haven't checked. Doing the blending in the browser may not be
> a necessity, but I would not want to close that door just yet.
>
> Finally, I have to admit, for the DL based approach, I'm not really sure I
> would know even where to start. I'll be thankful for any help here and I'll
> be happy to cooperate if somone else wants to work on it!
>
>
> I am also thinking about the implications of the blending approach vs the
> DL based approach... The goal of blending was to connect two graphs that
> each describe one half of the solution, like puzzle pieces. The example I
> had in mind was: A says, "A is the passenger of the taxi ride." B says, "B
> is the driver of the taxi ride." Join the two structures by blending the
> "taxi ride" node and you get "A is the passenger of the taxi ride that B is
> the driver of". It seems to me that one can arrive at this result without
> special reasoning rules that were designed for exactly this use case. A
> simple rdfs vocabulary, some SHACL constraints, and the variable/constant
> definition would be enough. All of these could be different in every
> interaction, allowing for evolution of 'reasoning' as well as evolution of
> the use of vocabularies. The DL approach may be very powerful for
> premeditated scenarios, but I am not sure how well it would work if agents
> want to interact in ways that were not foreseen by the ontology design.
>
> The approach we have outlined so far is more of a linked data flavored
> approach than a semantic web one. For many simple cases, it may be
> sufficient and work just fine. It seems more robust to me (less prone to
> fail with messy data) than the DL based approach, and it might be a good
> thing to have as a baseline. However, I can imagine that both approaches
> will be supported at some point and they can live next to each other: some
> agents may prefer to express their goals in line with rdf blending, some
> with DL class intersections, some may use both. In all cases, the net
> result should be an RDF graph that the partners can agree to.
>
> Best
> Florian
>
> Am 2017-11-21 03:54, schrieb Paul Tyson:
>
>> Hi Florian, very interesting problem. See my comments below.
>>
>> On Mon, 2017-11-20 at 19:09 +0100, Florian Kleedorfer wrote:
>>
>>> (Note: I changed the subject of this thread following Pat Hayes'
>>> suggestion [1])
>>>
>>> After some on-list and off-list discussions, the problem is becoming
>>> clearer. I'll try to state it again, along with some ideas for a
>>> solution. Any comments, criticism, and suggestions to collaborate are
>>> highly welcome ;-)
>>>
>>> # Blending:
>>> ## Definition
>>> Given two RDF graphs g1 and g2, blending refers to the pairwise renaming
>>> of resources (blank nodes or URI nodes) r1 in g1 and r2 in g2 such that
>>> one resource name (which may be r1, r2, or any other URI) is used
>>> instead of r1 and r2. The nodes r1 and r2 are said to be blended (ideas
>>> for a better term, anyone?). The resulting graphs are merged (avoiding
>>> accidental blank node identification, see RDF 1.1 Semantics[2]). The
>>> blending function b maps the input graphs to the set of all possible
>>> output graphs.
>>>
>>
>> I wonder why you do not frame the problem more in terms of description
>> logics. "Blending" as you have described it seems to be a much looser
>> (flexible?) way of finding the intersection of 2 classes. If the
>> provider has defined the classes of goods/services they can provide, and
>> the consumer defines the class(es) he wishes to buy, then before they
>> can do business they must find an instance of good or service that falls
>> in the intersection of those classes.
>>
>> Suppose the provider defines class P, the consumer class C. Then any
>> transaction between them would be in the class defined by
>> intersectionOf(P C). If this class is unsatisfiable (cannot have any
>> members, for instance if P has :price = 20 and C has :price < 15), you
>> want an algorithm to define and rank ways in which the provider's and
>> consumer's class definitions could be modified to make the intersection
>> satisfiable. These "negotiating rules" (or "distance measurements") are
>> the interesting part of the problem.
>>
>> I do not know if there are implementable solutions to these problems in
>> the description logics literature, but it might be worth a little
>> searching before building the graph-manipulation approach you have
>> outlined here.
>>
>> Regards,
>> --Paul
>>
>>>
>>> ## Discussion
>>> There are many ways to blend two graphs, and none of these ways are
>>> deemed correct or incorrect.
>>> We require this operation to combine two data structures held by
>>> mutually independent agents that have certain expectations about the
>>> blending solution. These expectations divide the set of possible
>>> outcomes at least in acceptable and unacceptable ones, possibly even in
>>> a preference ranking. I'll try to formulate that as well in the
>>> following.
>>>
>>>
>>> # Node blendability
>>> ## Rationale
>>> One way of constraining the result is to restrict which nodes can be
>>> blended. So far, we have identified three types: variable nodes,
>>> constant nodes, and unblendable nodes. Each node in an RDF graph has
>>> exactly one of the three types. A variable node may be blended and its
>>> name may be changed. A constant node may also be blended, but its name
>>> is not allowed to change. An unblendable node must not be blended.
>>>
>>> One may note that blending g1 and g2 is equivalent to merging them if no
>>> variable nodes are present in either graph.
>>>
>>> It is currently unclear how the constant/variable/unblendable property
>>> should be expressed. My intuition is that in most practical use cases,
>>> most nodes need to be unblendable (ontology/TBox resources, properties
>>> in general), therefore it would be more economical to assume nodes to be
>>> unblendable by default. Then, variable/constant nodes have to be
>>> annotated as such explicitly.
>>> Maybe it would be an option to make definitions by URI prefix, so we can
>>> define a complete vocabulary as unblendable with one statement.
>>> (opinions on that?)
>>>
>>> ## Definition
>>> Given two graphs g1 and g2, a blendability-aware blending function is a
>>> blending function for g1 and g2 that only replaces resources that have
>>> not been declared as constant with respect to blending. The latter is
>>> done by specifying a triple "<graph> bl:containsConstant <resource> ."
>>> or "<graph> bl:containsVariable <resource> ." in an additional graph
>>> passed to the blendability-aware blending function. Such a graph is
>>> called a 'blendability graph'.
>>>
>>>
>>> # Constraints on the blending solution:
>>> ## Rationale
>>> As explained earlier, there may be multiple parties that have
>>> expectations for the blending solution. So far, we have talked about the
>>> originators of the data. There may also be one or more intended
>>> consumers of the data. Nothing prevents the expectations from being
>>> mutually incompatible (meaning that there exists no graph that satisfies
>>> them all) or that they are even unsatisfiable by themselves. It would of
>>> course be beneficial if such cases could be detected automaticall, but
>>> for now, let's just leave it at noting the possibility. Now, if the
>>> information the agents provided, when considered in total, has spurious
>>> or conflicting information, or if information is missing (taking their
>>> respective expectations as a definition of what is acceptable), the
>>> blending solution cannot meet the expectations. Nevertheless, it has to
>>> be computed, in order to be able to take the next step and "repair"
>>> (better word, please) one or both graphs and try again.
>>>
>>> It would be nice to use constraints for the following goals during the
>>> calculation of the blending solution:
>>> 1. pruning wholly unacceptable solutions
>>> 2. ranking unacceptable solutions by badness
>>> 3. ranking acceptable-looking, but incomplete solutions ("so far so
>>> good")
>>>
>>> Constraints should allow us to identify an acceptable result, or to
>>> select the best result of a number of unacceptable ones, and report them
>>> to users so that they can repair them.
>>>
>>> However, if the expectations define an acceptable result, anything else,
>>> however close, is unacceptable, and in the absence of acceptable results
>>> we are left with unacceptable ones, some of which may be completely
>>> bogus while others are almost acceptable. So, the three categories above
>>> cannot be separated, and the best we can hope for is reasonable ranking
>>> of unacceptable solutions.
>>>
>>> For the conceptualization, we go with a simple SHACL-based approach.
>>> Other systems (SPARQL, ShEx) are definitely possible as well.
>>>
>>> ## Definition
>>> The constraints-aware blending function is a blending function for two
>>> RDF graphs g1 and g2 and takes optional graph names s1,...,sN that
>>> denote graphs which contain SHACL shapes constraining the result. Each
>>> blending solution is validated using s1,...,sN; the result of the
>>> function is a mapping of the blending solutions to their respective
>>> SHACL validation results, indexed by shape graph name (s1,...,sN). A
>>> blending solution is called accepted with respect to a constraints graph
>>> s if it causes no SHACL validation results when validated against s. A
>>> blending solution is called accepted if it is accepted with respect to
>>> all specified shapes graphs.
>>>
>>> ## Discussion
>>> The main issue in this step seems to be the pruning and ordering of
>>> unacceptable results. Blending functions may choose to drop blending
>>> solutions if they are deemed too bad - but where should they draw the
>>> line and still report an unacceptable blending solution while dropping
>>> another?
>>>
>>> With respect to one shapes graph, we could define that validation result
>>> graphs are ranked by maximum severity (1 sh:Violation is worse than 1000
>>> sh:Warnings), and within those classes by number of validation results
>>> (2 Violations and 1 Warnings is worse than 1 Violation and 1000
>>> Warnings).
>>>
>>> I'm suspecting that the validation results should allow us to prune at
>>> least some bad blending solutions if their validation results are
>>> supersets of other solutions' validation results: if solution A has
>>> validation results (R1) and solution B has validation result (R1, R2) we
>>> can say that solution B is a deterioration of solution A and should not
>>> be reported.
>>>
>>> In addition to constraint-based pruning and ranking, it is of course
>>> possible to perform reasoning-based pruning and validate the blending
>>> solution according to appropriate reasoning logics (RDFS, OWL, ...).
>>> However, I am not sure as to the usefulness of this approach if no
>>> solution is found that is consistent according to the logic.
>>>
>>>
>>> # Graph blending algorithms
>>> Let's assume for this section that we are blending two graphs g1 and g2,
>>> and we know we have variable nodes var(g1) and constant nodes const(g1)
>>> in g1, and var(g2) and const(g2) in g2.
>>>
>>> ## Complete
>>> The complete algorithm consists of enumerating all possible solutions
>>> For each n in var(g1), there are the following solutions:
>>> * 1 solution in which n is not blended at all
>>> * 1 solution for each m in var(g2), in which n is blended with m
>>> * 1 solution for each m in const(g2), in which n is blended with m
>>> symmetrically the same is true for each n in var(g2).
>>> Removal of duplicate solutions yields the set of solutions.
>>>
>>> ## Heuristic
>>> We may interpret blending as a heuristic search and apply an A*-like
>>> algorithm to finding good results, assessing the distance to the goal by
>>> evaluating the constraints, and the distance from the start by the
>>> number of node blendings.
>>>
>>> ## Knowledge-based
>>> Exploiting knowledge about the actual structures in g1 and g2, the
>>> heuristic approach could be enhanced to apply more appropriate distance
>>> estimates and select more promising paths first.
>>>
>>>
>>>
>>> Links:
>>> 1. https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html
>>> 2. https://www.w3.org/TR/rdf11-mt/
>>>
>>>
>>>
>>>
>>> Am 2017-11-16 19:37, schrieb Florian Kleedorfer:
>>> > Hi,
>>> >
>>> > As you may recall, I've asked this list about messaging/negotiating
>>> > using linked data[1], and it led me and my colleagues to a concept of
>>> > RDF-based agreements[2]. (Thanks again for your input!)
>>> >
>>> > Now it's time for us to define how the negotiating parties come up
>>> > with the data (set of rdf triples) that they agree on.
>>> >
>>> > One Idea we have is that both particpants (P1,P2) specify 'goals',
>>> > each consisting of a SHACL shapes graph and a data graph. Two goals
>>> > can be tested for compatibility by merging the two data graphs and
>>> > then testing if the resulting graph is valid given the shapes defined
>>> > by both goals. The problem we face with this approach is merging the
>>> > two data graphs. Consider the example in this gist:
>>> >
>>> > https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332
>>> >
>>> > In that example, we would want to merge these two graphs:
>>> >
>>> > ex1:p1g-data {
>>> >     ex1:ride1 a taxi:Ride .
>>> >     ex1:ride1 taxi:hasDriver ex1:p1 .
>>> >     ex1:ride1 taxi:hasPickupLocation ex1:pickup1;
>>> > }
>>> >
>>> > and
>>> >
>>> > ex2:p2g-data {
>>> >     ex2:myRide a taxi:Ride.
>>> >     ex2:myRide taxi:hasClient ex2:p2 .
>>> >     ex2:myRide taxi:hasPickupLocation ex2:myPickupLocation .
>>> >     ex2:myPickupLocation a s:GeoCoordinates ;
>>> >         s:latitude   "48.213814" ;
>>> >         s:longitude  "16.340870" .
>>> > }
>>> >
>>> > We want that merge to happen in such a way that the shapes in
>>> > ex1:p1g-shapes and ex2:p2g-shapes can be evaluated, which will require
>>> > to merge some of the nodes. (In the example, we'll find out that a
>>> > taxi:hasPickupTime property is missing. One of the participants is
>>> > then to add that property, and the whole structure becomes valid
>>> > according to both goals' shapes.)
>>> >
>>> > The question is how exactly to achieve that merge. Intuitively, the
>>> > result in this case should be
>>> >
>>> >    :mergedNode1 a taxi:Ride .
>>> >    :mergedNode1 taxi:hasDriver ex1:p1 . # note: p1 links her own
>>> > identifier to the structure
>>> >    :mergedNode1 taxi:hasPickupLocation :mergedNode2;
>>> >    :mergedNode1 a taxi:Ride.
>>> >    :mergedNode1 taxi:hasClient ex2:p2 .
>>> >    :mergedNode1 taxi:hasPickupLocation :mergedNode2 .
>>> >    :mergedNode2 a s:GeoCoordinates ;
>>> >         s:latitude   "48.213814" ;
>>> >         s:longitude  "16.340870" .
>>> >
>>> > and after removal of duplicate triples
>>> >
>>> >    :mergedNode1 a taxi:Ride .
>>> >    :mergedNode1 taxi:hasDriver ex1:p1 .
>>> >    :mergedNode1 taxi:hasClient ex2:p2 .
>>> >    :mergedNode1 taxi:hasPickupLocation :mergedNode2 .
>>> >    :mergedNode2 a s:GeoCoordinates ;
>>> >         s:latitude   "48.213814" ;
>>> >         s:longitude  "16.340870" .
>>> >
>>> > (I currently don't care about the URIs of the mergedNodes - just
>>> > something one can mint when doing the merge)
>>> >
>>> >
>>> > Do you have any hints on how to perform such a merge reliably, also
>>> > with more complex structures?
>>> >
>>> > Thank you very much in advance!
>>> >
>>> > Best regards,
>>> > Florian
>>> >
>>> >
>>> > Links:
>>> > 1. Negotiations discussion thread:
>>> > https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html
>>> > 2. Agreements paper: http://ceur-ws.org/Vol-1934/contribution-07.pdf
>>>
>>>
>
Received on Tuesday, 21 November 2017 13:35:03 UTC