Re: Blending RDF graphs

Hi again,

if you only need to "blend" 2 graphs at a time, with one typed resource in
each, I guess you can join on type, again even with SPARQL. Can there be
other commonality than type? What if you have multiple resources with the
same type?

This seems like a very limited use case to me, which does not scale. I was
thinking more about a triplestore with thousands or millions of graphs
where any number of rides might match drivers to passengers. In that case I
don't think joining only on type can work.

I'm also quite sure (optimized) SPARQL will be faster than any
rules/inference, and probably also custom algorithms, unless you use some
specialized storage beyond triples.

On Tue, Nov 21, 2017 at 3:48 PM, Florian Kleedorfer <
florian.kleedorfer@austria.fm> wrote:

> Hi Martynas,
>
> I am not sure if I let the use case seem too narrow here. The taxi case is
> meant as an example, but the two graphs describing the agent's goals could
> contain any RDF. The commonality of all use cases is that a goal by itself
> is always incomplete and needs another agent's mutually compatible goal for
> being completed.
>
> Is my problem really that I have nothing to join on? In the example, there
> are the TBox resources that are shared (e.g., taxi:Ride) that could be used
> for joining. I think in the case of the example below, the CONSTRUCT
> approach is possible, as suggested and shown by Axel Polleres off-list a
> few days ago as a pattern-based solution to this problem. I would classify
> it as a knowledge-based 'blending' approach because the target structure as
> well as the structure of the two graphs we want to 'blend' need to be known
> beforehand, which allows for substantial optimizations.
>
> A downside the pattern-based approach, I suspect, is that it can only
> handle expected cases. Or do you see a way to generate such a SPARQL
> statement from the data ad-hoc?
>
> As a side-note, I am not so sure if I would refer to SPARQL as 'simple'.
> It's easy to write a simple SPARQL query, but it requires resources to
> parse and execute it, and it may be more efficient to reach the same result
> with a different algorithm, more tailored to the problem.
>
>
>
> Am 2017-11-21 14:34, schrieb Martynas Jusevičius:
>
>> If understand your use case right, your problem is that you have
>> nothing to join on, i.e. no common nodes, between the 2 rides.
>>
>> If there was something in common, I think it could be solved with a
>> simple SPARQL CONSTRUCT. Now what that should be, depends on your use
>> case.
>>
>> Maybe geohash of the location? https://en.wikipedia.org/wiki/Geohash
>>
>> In any case, I don't think it requires some kind of new graph
>> "blending" algorithm.
>>
>
> I would be very relieved if I could take something off the shelf here.
>
> Best regards,
> Florian
>
>
>
>
>> On Tue, Nov 21, 2017 at 12:40 PM, Florian Kleedorfer
>> <florian.kleedorfer@austria.fm> wrote:
>>
>> Hi Paul,
>>>
>>> I received a similar hint yesterday off-list. I am really grateful
>>> for these suggestions, as I had not seen the possibility at all. It
>>> seems we should really look into it.
>>>
>>> You asked why I had not framed the problem more in terms of
>>> description logics. The main reason is: I had not even considered
>>> it, really. Here are some possible causes for that:
>>>
>>> One thing to say about description logics is that it is one more
>>> thing to learn, which makes it that much more unlikely a programmer
>>> will use description logics over something else that gets the job
>>> done. The graph blending approach is simpler in terms of logics but
>>> you have to learn SHACL before you can use it - not sure about the
>>> net trade-off value.
>>>
>>> Also, I am a little bit afraid of OWL for the appliation I have in
>>> mind, as it does not, in my (limited) experience, do well in the
>>> presence of messy data. It is easy to formulate inconsistent data,
>>> and to infer really unexpected things. The application - or rather,
>>> the protocol - is completely open (anyone can say anything..
>>> including incomplete or inconsistent things), so I would prefer to
>>> remain on a less expressive level as much as possible, ideally doing
>>> no reasoning at all.
>>>
>>> I've seen SHACL work quite nicely in the browser, but I am not sure
>>> about reasoners for OWL or OWL-DL. Because of the reservations
>>> stated earlier, I must admit, I haven't checked. Doing the blending
>>> in the browser may not be a necessity, but I would not want to close
>>> that door just yet.
>>>
>>> Finally, I have to admit, for the DL based approach, I'm not really
>>> sure I would know even where to start. I'll be thankful for any help
>>> here and I'll be happy to cooperate if somone else wants to work on
>>> it!
>>>
>>> I am also thinking about the implications of the blending approach
>>> vs the DL based approach... The goal of blending was to connect two
>>> graphs that each describe one half of the solution, like puzzle
>>> pieces. The example I had in mind was: A says, "A is the passenger
>>> of the taxi ride." B says, "B is the driver of the taxi ride." Join
>>> the two structures by blending the "taxi ride" node and you get "A
>>> is the passenger of the taxi ride that B is the driver of". It seems
>>> to me that one can arrive at this result without special reasoning
>>> rules that were designed for exactly this use case. A simple rdfs
>>> vocabulary, some SHACL constraints, and the variable/constant
>>> definition would be enough. All of these could be different in every
>>> interaction, allowing for evolution of 'reasoning' as well as
>>> evolution of the use of vocabularies. The DL approach may be very
>>> powerful for premeditated scenarios, but I am not sure how well it
>>> would work if agents want to interact in ways that were not foreseen
>>> by the ontology design.
>>>
>>> The approach we have outlined so far is more of a linked data
>>> flavored approach than a semantic web one. For many simple cases, it
>>> may be sufficient and work just fine. It seems more robust to me
>>> (less prone to fail with messy data) than the DL based approach, and
>>> it might be a good thing to have as a baseline. However, I can
>>> imagine that both approaches will be supported at some point and
>>> they can live next to each other: some agents may prefer to express
>>> their goals in line with rdf blending, some with DL class
>>> intersections, some may use both. In all cases, the net result
>>> should be an RDF graph that the partners can agree to.
>>>
>>> Best
>>> Florian
>>>
>>> Am 2017-11-21 03:54, schrieb Paul Tyson:
>>> Hi Florian, very interesting problem. See my comments below.
>>>
>>> On Mon, 2017-11-20 at 19:09 +0100, Florian Kleedorfer wrote:
>>> (Note: I changed the subject of this thread following Pat Hayes'
>>> suggestion [1])
>>>
>>> After some on-list and off-list discussions, the problem is becoming
>>> clearer. I'll try to state it again, along with some ideas for a
>>> solution. Any comments, criticism, and suggestions to collaborate
>>> are
>>> highly welcome ;-)
>>>
>>> # Blending:
>>> ## Definition
>>> Given two RDF graphs g1 and g2, blending refers to the pairwise
>>> renaming
>>> of resources (blank nodes or URI nodes) r1 in g1 and r2 in g2 such
>>> that
>>> one resource name (which may be r1, r2, or any other URI) is used
>>> instead of r1 and r2. The nodes r1 and r2 are said to be blended
>>> (ideas
>>> for a better term, anyone?). The resulting graphs are merged
>>> (avoiding
>>> accidental blank node identification, see RDF 1.1 Semantics[2]). The
>>> blending function b maps the input graphs to the set of all possible
>>> output graphs.
>>>
>>> I wonder why you do not frame the problem more in terms of
>>> description
>>> logics. "Blending" as you have described it seems to be a much
>>> looser
>>> (flexible?) way of finding the intersection of 2 classes. If the
>>> provider has defined the classes of goods/services they can provide,
>>> and
>>> the consumer defines the class(es) he wishes to buy, then before
>>> they
>>> can do business they must find an instance of good or service that
>>> falls
>>> in the intersection of those classes.
>>>
>>> Suppose the provider defines class P, the consumer class C. Then any
>>> transaction between them would be in the class defined by
>>> intersectionOf(P C). If this class is unsatisfiable (cannot have any
>>> members, for instance if P has :price = 20 and C has :price < 15),
>>> you
>>> want an algorithm to define and rank ways in which the provider's
>>> and
>>> consumer's class definitions could be modified to make the
>>> intersection
>>> satisfiable. These "negotiating rules" (or "distance measurements")
>>> are
>>> the interesting part of the problem.
>>>
>>> I do not know if there are implementable solutions to these problems
>>> in
>>> the description logics literature, but it might be worth a little
>>> searching before building the graph-manipulation approach you have
>>> outlined here.
>>>
>>> Regards,
>>> --Paul
>>>
>>> ## Discussion
>>> There are many ways to blend two graphs, and none of these ways are
>>> deemed correct or incorrect.
>>> We require this operation to combine two data structures held by
>>> mutually independent agents that have certain expectations about the
>>> blending solution. These expectations divide the set of possible
>>> outcomes at least in acceptable and unacceptable ones, possibly even
>>> in
>>> a preference ranking. I'll try to formulate that as well in the
>>> following.
>>>
>>> # Node blendability
>>> ## Rationale
>>> One way of constraining the result is to restrict which nodes can be
>>> blended. So far, we have identified three types: variable nodes,
>>> constant nodes, and unblendable nodes. Each node in an RDF graph has
>>> exactly one of the three types. A variable node may be blended and
>>> its
>>> name may be changed. A constant node may also be blended, but its
>>> name
>>> is not allowed to change. An unblendable node must not be blended.
>>>
>>> One may note that blending g1 and g2 is equivalent to merging them
>>> if no
>>> variable nodes are present in either graph.
>>>
>>> It is currently unclear how the constant/variable/unblendable
>>> property
>>> should be expressed. My intuition is that in most practical use
>>> cases,
>>> most nodes need to be unblendable (ontology/TBox resources,
>>> properties
>>> in general), therefore it would be more economical to assume nodes
>>> to be
>>> unblendable by default. Then, variable/constant nodes have to be
>>> annotated as such explicitly.
>>> Maybe it would be an option to make definitions by URI prefix, so we
>>> can
>>> define a complete vocabulary as unblendable with one statement.
>>> (opinions on that?)
>>>
>>> ## Definition
>>> Given two graphs g1 and g2, a blendability-aware blending function
>>> is a
>>> blending function for g1 and g2 that only replaces resources that
>>> have
>>> not been declared as constant with respect to blending. The latter
>>> is
>>> done by specifying a triple "<graph> bl:containsConstant <resource>
>>> ."
>>> or "<graph> bl:containsVariable <resource> ." in an additional graph
>>> passed to the blendability-aware blending function. Such a graph is
>>> called a 'blendability graph'.
>>>
>>> # Constraints on the blending solution:
>>> ## Rationale
>>> As explained earlier, there may be multiple parties that have
>>> expectations for the blending solution. So far, we have talked about
>>> the
>>> originators of the data. There may also be one or more intended
>>> consumers of the data. Nothing prevents the expectations from being
>>> mutually incompatible (meaning that there exists no graph that
>>> satisfies
>>> them all) or that they are even unsatisfiable by themselves. It
>>> would of
>>> course be beneficial if such cases could be detected automaticall,
>>> but
>>> for now, let's just leave it at noting the possibility. Now, if the
>>> information the agents provided, when considered in total, has
>>> spurious
>>> or conflicting information, or if information is missing (taking
>>> their
>>> respective expectations as a definition of what is acceptable), the
>>> blending solution cannot meet the expectations. Nevertheless, it has
>>> to
>>> be computed, in order to be able to take the next step and "repair"
>>> (better word, please) one or both graphs and try again.
>>>
>>> It would be nice to use constraints for the following goals during
>>> the
>>> calculation of the blending solution:
>>> 1. pruning wholly unacceptable solutions
>>> 2. ranking unacceptable solutions by badness
>>> 3. ranking acceptable-looking, but incomplete solutions ("so far so
>>> good")
>>>
>>> Constraints should allow us to identify an acceptable result, or to
>>> select the best result of a number of unacceptable ones, and report
>>> them
>>> to users so that they can repair them.
>>>
>>> However, if the expectations define an acceptable result, anything
>>> else,
>>> however close, is unacceptable, and in the absence of acceptable
>>> results
>>> we are left with unacceptable ones, some of which may be completely
>>> bogus while others are almost acceptable. So, the three categories
>>> above
>>> cannot be separated, and the best we can hope for is reasonable
>>> ranking
>>> of unacceptable solutions.
>>>
>>> For the conceptualization, we go with a simple SHACL-based approach.
>>> Other systems (SPARQL, ShEx) are definitely possible as well.
>>>
>>> ## Definition
>>> The constraints-aware blending function is a blending function for
>>> two
>>> RDF graphs g1 and g2 and takes optional graph names s1,...,sN that
>>> denote graphs which contain SHACL shapes constraining the result.
>>> Each
>>> blending solution is validated using s1,...,sN; the result of the
>>> function is a mapping of the blending solutions to their respective
>>> SHACL validation results, indexed by shape graph name (s1,...,sN). A
>>> blending solution is called accepted with respect to a constraints
>>> graph
>>> s if it causes no SHACL validation results when validated against s.
>>> A
>>> blending solution is called accepted if it is accepted with respect
>>> to
>>> all specified shapes graphs.
>>>
>>> ## Discussion
>>> The main issue in this step seems to be the pruning and ordering of
>>> unacceptable results. Blending functions may choose to drop blending
>>> solutions if they are deemed too bad - but where should they draw
>>> the
>>> line and still report an unacceptable blending solution while
>>> dropping
>>> another?
>>>
>>> With respect to one shapes graph, we could define that validation
>>> result
>>> graphs are ranked by maximum severity (1 sh:Violation is worse than
>>> 1000
>>> sh:Warnings), and within those classes by number of validation
>>> results
>>> (2 Violations and 1 Warnings is worse than 1 Violation and 1000
>>> Warnings).
>>>
>>> I'm suspecting that the validation results should allow us to prune
>>> at
>>> least some bad blending solutions if their validation results are
>>> supersets of other solutions' validation results: if solution A has
>>> validation results (R1) and solution B has validation result (R1,
>>> R2) we
>>> can say that solution B is a deterioration of solution A and should
>>> not
>>> be reported.
>>>
>>> In addition to constraint-based pruning and ranking, it is of course
>>> possible to perform reasoning-based pruning and validate the
>>> blending
>>> solution according to appropriate reasoning logics (RDFS, OWL, ...).
>>> However, I am not sure as to the usefulness of this approach if no
>>> solution is found that is consistent according to the logic.
>>>
>>> # Graph blending algorithms
>>> Let's assume for this section that we are blending two graphs g1 and
>>> g2,
>>> and we know we have variable nodes var(g1) and constant nodes
>>> const(g1)
>>> in g1, and var(g2) and const(g2) in g2.
>>>
>>> ## Complete
>>> The complete algorithm consists of enumerating all possible
>>> solutions
>>> For each n in var(g1), there are the following solutions:
>>> * 1 solution in which n is not blended at all
>>> * 1 solution for each m in var(g2), in which n is blended with m
>>> * 1 solution for each m in const(g2), in which n is blended with m
>>> symmetrically the same is true for each n in var(g2).
>>> Removal of duplicate solutions yields the set of solutions.
>>>
>>> ## Heuristic
>>> We may interpret blending as a heuristic search and apply an A*-like
>>> algorithm to finding good results, assessing the distance to the
>>> goal by
>>> evaluating the constraints, and the distance from the start by the
>>> number of node blendings.
>>>
>>> ## Knowledge-based
>>> Exploiting knowledge about the actual structures in g1 and g2, the
>>> heuristic approach could be enhanced to apply more appropriate
>>> distance
>>> estimates and select more promising paths first.
>>>
>>> Links:
>>> 1.
>>> https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html
>>> [1]
>>> 2. https://www.w3.org/TR/rdf11-mt/ [2]
>>>
>>> Am 2017-11-16 19:37, schrieb Florian Kleedorfer:
>>>
>>>> Hi,
>>>>
>>>> As you may recall, I've asked this list about
>>>>
>>> messaging/negotiating
>>>
>>>> using linked data[1], and it led me and my colleagues to a concept
>>>>
>>> of
>>>
>>>> RDF-based agreements[2]. (Thanks again for your input!)
>>>>
>>>> Now it's time for us to define how the negotiating parties come up
>>>> with the data (set of rdf triples) that they agree on.
>>>>
>>>> One Idea we have is that both particpants (P1,P2) specify 'goals',
>>>> each consisting of a SHACL shapes graph and a data graph. Two
>>>>
>>> goals
>>>
>>>> can be tested for compatibility by merging the two data graphs and
>>>> then testing if the resulting graph is valid given the shapes
>>>>
>>> defined
>>>
>>>> by both goals. The problem we face with this approach is merging
>>>>
>>> the
>>>
>>>> two data graphs. Consider the example in this gist:
>>>>
>>>>
>>>> https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332
>>> [3]
>>>
>>>
>>>> In that example, we would want to merge these two graphs:
>>>>
>>>> ex1:p1g-data {
>>>> ex1:ride1 a taxi:Ride .
>>>> ex1:ride1 taxi:hasDriver ex1:p1 .
>>>> ex1:ride1 taxi:hasPickupLocation ex1:pickup1;
>>>> }
>>>>
>>>> and
>>>>
>>>> ex2:p2g-data {
>>>> ex2:myRide a taxi:Ride.
>>>> ex2:myRide taxi:hasClient ex2:p2 .
>>>> ex2:myRide taxi:hasPickupLocation ex2:myPickupLocation .
>>>> ex2:myPickupLocation a s:GeoCoordinates ;
>>>> s:latitude   "48.213814" ;
>>>> s:longitude  "16.340870" .
>>>> }
>>>>
>>>> We want that merge to happen in such a way that the shapes in
>>>> ex1:p1g-shapes and ex2:p2g-shapes can be evaluated, which will
>>>>
>>> require
>>>
>>>> to merge some of the nodes. (In the example, we'll find out that a
>>>> taxi:hasPickupTime property is missing. One of the participants is
>>>> then to add that property, and the whole structure becomes valid
>>>> according to both goals' shapes.)
>>>>
>>>> The question is how exactly to achieve that merge. Intuitively,
>>>>
>>> the
>>>
>>>> result in this case should be
>>>>
>>>> :mergedNode1 a taxi:Ride .
>>>> :mergedNode1 taxi:hasDriver ex1:p1 . # note: p1 links her own
>>>> identifier to the structure
>>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2;
>>>> :mergedNode1 a taxi:Ride.
>>>> :mergedNode1 taxi:hasClient ex2:p2 .
>>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2 .
>>>> :mergedNode2 a s:GeoCoordinates ;
>>>> s:latitude   "48.213814" ;
>>>> s:longitude  "16.340870" .
>>>>
>>>> and after removal of duplicate triples
>>>>
>>>> :mergedNode1 a taxi:Ride .
>>>> :mergedNode1 taxi:hasDriver ex1:p1 .
>>>> :mergedNode1 taxi:hasClient ex2:p2 .
>>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2 .
>>>> :mergedNode2 a s:GeoCoordinates ;
>>>> s:latitude   "48.213814" ;
>>>> s:longitude  "16.340870" .
>>>>
>>>> (I currently don't care about the URIs of the mergedNodes - just
>>>> something one can mint when doing the merge)
>>>>
>>>>
>>>> Do you have any hints on how to perform such a merge reliably,
>>>>
>>> also
>>>
>>>> with more complex structures?
>>>>
>>>> Thank you very much in advance!
>>>>
>>>> Best regards,
>>>> Florian
>>>>
>>>>
>>>> Links:
>>>> 1. Negotiations discussion thread:
>>>>
>>>> https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html
>>> [4]
>>>
>>>> 2. Agreements paper:
>>>>
>>> http://ceur-ws.org/Vol-1934/contribution-07.pdf [5]
>>>
>>
>>
>>
>> Links:
>> ------
>> [1] https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html
>> [2] https://www.w3.org/TR/rdf11-mt/
>> [3] https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332
>> [4] https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html
>> [5] http://ceur-ws.org/Vol-1934/contribution-07.pdf
>>
>

Received on Tuesday, 21 November 2017 15:02:58 UTC