Re: Blending RDF graphs from Martynas Jusevičius on 2017-11-21 (semantic-web@w3.org from November 2017)

From: Martynas Jusevičius <martynas@atomgraph.com>
Date: Tue, 21 Nov 2017 17:22:43 +0100
To: Florian Kleedorfer <florian.kleedorfer@austria.fm>
Cc: Paul Tyson <phtyson@sbcglobal.net>, Semantic Web <semantic-web@w3.org>
Message-ID: <CAE35VmzRn0Uza_SvubV6cbkZwArivL330Y071r2ku9e_BpnbqQ@mail.gmail.com>
Well if you can join on type, what about something like

CONSTRUCT
{
  ?newSubj ?p1 ?o1 ;
    ?p2 ?o2
}
{
  GRAPH ?g1
  {
    ?subj1 a ?type ;
      ?p1 ?o1
  }
  GRAPH ?g2
  {
    ?subj2 a ?type ;
      ?p2 ?o2
  }
  BIND (UUID() AS ?newSubj)
}

On Tue, Nov 21, 2017 at 4:48 PM, Florian Kleedorfer <
florian.kleedorfer@austria.fm> wrote:

> Hi,
>
>
>
> Am 2017-11-21 16:02, schrieb Martynas Jusevičius:
>
>> Hi again,
>>
>> if you only need to "blend" 2 graphs at a time, with one typed
>> resource in each, I guess you can join on type, again even with
>> SPARQL. Can there be other commonality than type? What if you have
>> multiple resources with the same type?
>>
>
> The individual graphs could be more complex, for example describing a
> goods transport including packaging, content, pickup time/location delivery
> time/location.
> The setting you are referring to is what we would call matching. We assume
> for the blending of goals graphs that matching has already occurred: the
> two agents have already identified each other and now want to describe
> their transaction.
> There aren't millions of graphs, there are only a handful (all the goals
> specified by both agents, in most cases probably one on each side).
>
>
>> This seems like a very limited use case to me, which does not scale. I
>> was thinking more about a triplestore with thousands or millions of
>> graphs where any number of rides might match drivers to passengers. In
>> that case I don't think joining only on type can work.
>>
>> I'm also quite sure (optimized) SPARQL will be faster than any
>> rules/inference, and probably also custom algorithms, unless you use
>> some specialized storage beyond triples.
>>
>
> I suspect the same, except that I am not (yet) convinced that I want to
> use SPARQL in the browser - which may be one platform we'd like to use.
>
>
>> On Tue, Nov 21, 2017 at 3:48 PM, Florian Kleedorfer
>> <florian.kleedorfer@austria.fm> wrote:
>>
>> Hi Martynas,
>>>
>>> I am not sure if I let the use case seem too narrow here. The taxi
>>> case is meant as an example, but the two graphs describing the
>>> agent's goals could contain any RDF. The commonality of all use
>>> cases is that a goal by itself is always incomplete and needs
>>> another agent's mutually compatible goal for being completed.
>>>
>>> Is my problem really that I have nothing to join on? In the example,
>>> there are the TBox resources that are shared (e.g., taxi:Ride) that
>>> could be used for joining. I think in the case of the example below,
>>> the CONSTRUCT approach is possible, as suggested and shown by Axel
>>> Polleres off-list a few days ago as a pattern-based solution to this
>>> problem. I would classify it as a knowledge-based 'blending'
>>> approach because the target structure as well as the structure of
>>> the two graphs we want to 'blend' need to be known beforehand, which
>>> allows for substantial optimizations.
>>>
>>> A downside the pattern-based approach, I suspect, is that it can
>>> only handle expected cases. Or do you see a way to generate such a
>>> SPARQL statement from the data ad-hoc?
>>>
>>> As a side-note, I am not so sure if I would refer to SPARQL as
>>> 'simple'. It's easy to write a simple SPARQL query, but it requires
>>> resources to parse and execute it, and it may be more efficient to
>>> reach the same result with a different algorithm, more tailored to
>>> the problem.
>>>
>>> Am 2017-11-21 14:34, schrieb Martynas Jusevičius:
>>>
>>> If understand your use case right, your problem is that you have
>>>> nothing to join on, i.e. no common nodes, between the 2 rides.
>>>>
>>>> If there was something in common, I think it could be solved with
>>>> a
>>>> simple SPARQL CONSTRUCT. Now what that should be, depends on your
>>>> use
>>>> case.
>>>>
>>>> Maybe geohash of the location?
>>>> https://en.wikipedia.org/wiki/Geohash [1]
>>>>
>>>>
>>>> In any case, I don't think it requires some kind of new graph
>>>> "blending" algorithm.
>>>>
>>>
>>> I would be very relieved if I could take something off the shelf
>>> here.
>>>
>>> Best regards,
>>> Florian
>>>
>>> On Tue, Nov 21, 2017 at 12:40 PM, Florian Kleedorfer
>>> <florian.kleedorfer@austria.fm> wrote:
>>>
>>> Hi Paul,
>>>
>>> I received a similar hint yesterday off-list. I am really grateful
>>> for these suggestions, as I had not seen the possibility at all. It
>>> seems we should really look into it.
>>>
>>> You asked why I had not framed the problem more in terms of
>>> description logics. The main reason is: I had not even considered
>>> it, really. Here are some possible causes for that:
>>>
>>> One thing to say about description logics is that it is one more
>>> thing to learn, which makes it that much more unlikely a programmer
>>> will use description logics over something else that gets the job
>>> done. The graph blending approach is simpler in terms of logics but
>>> you have to learn SHACL before you can use it - not sure about the
>>> net trade-off value.
>>>
>>> Also, I am a little bit afraid of OWL for the appliation I have in
>>> mind, as it does not, in my (limited) experience, do well in the
>>> presence of messy data. It is easy to formulate inconsistent data,
>>> and to infer really unexpected things. The application - or rather,
>>> the protocol - is completely open (anyone can say anything..
>>> including incomplete or inconsistent things), so I would prefer to
>>> remain on a less expressive level as much as possible, ideally doing
>>> no reasoning at all.
>>>
>>> I've seen SHACL work quite nicely in the browser, but I am not sure
>>> about reasoners for OWL or OWL-DL. Because of the reservations
>>> stated earlier, I must admit, I haven't checked. Doing the blending
>>> in the browser may not be a necessity, but I would not want to close
>>> that door just yet.
>>>
>>> Finally, I have to admit, for the DL based approach, I'm not really
>>> sure I would know even where to start. I'll be thankful for any help
>>> here and I'll be happy to cooperate if somone else wants to work on
>>> it!
>>>
>>> I am also thinking about the implications of the blending approach
>>> vs the DL based approach... The goal of blending was to connect two
>>> graphs that each describe one half of the solution, like puzzle
>>> pieces. The example I had in mind was: A says, "A is the passenger
>>> of the taxi ride." B says, "B is the driver of the taxi ride." Join
>>> the two structures by blending the "taxi ride" node and you get "A
>>> is the passenger of the taxi ride that B is the driver of". It seems
>>> to me that one can arrive at this result without special reasoning
>>> rules that were designed for exactly this use case. A simple rdfs
>>> vocabulary, some SHACL constraints, and the variable/constant
>>> definition would be enough. All of these could be different in every
>>> interaction, allowing for evolution of 'reasoning' as well as
>>> evolution of the use of vocabularies. The DL approach may be very
>>> powerful for premeditated scenarios, but I am not sure how well it
>>> would work if agents want to interact in ways that were not foreseen
>>> by the ontology design.
>>>
>>> The approach we have outlined so far is more of a linked data
>>> flavored approach than a semantic web one. For many simple cases, it
>>> may be sufficient and work just fine. It seems more robust to me
>>> (less prone to fail with messy data) than the DL based approach, and
>>> it might be a good thing to have as a baseline. However, I can
>>> imagine that both approaches will be supported at some point and
>>> they can live next to each other: some agents may prefer to express
>>> their goals in line with rdf blending, some with DL class
>>> intersections, some may use both. In all cases, the net result
>>> should be an RDF graph that the partners can agree to.
>>>
>>> Best
>>> Florian
>>>
>>> Am 2017-11-21 03:54, schrieb Paul Tyson:
>>> Hi Florian, very interesting problem. See my comments below.
>>>
>>> On Mon, 2017-11-20 at 19:09 +0100, Florian Kleedorfer wrote:
>>> (Note: I changed the subject of this thread following Pat Hayes'
>>> suggestion [1])
>>>
>>> After some on-list and off-list discussions, the problem is becoming
>>> clearer. I'll try to state it again, along with some ideas for a
>>> solution. Any comments, criticism, and suggestions to collaborate
>>> are
>>> highly welcome ;-)
>>>
>>> # Blending:
>>> ## Definition
>>> Given two RDF graphs g1 and g2, blending refers to the pairwise
>>> renaming
>>> of resources (blank nodes or URI nodes) r1 in g1 and r2 in g2 such
>>> that
>>> one resource name (which may be r1, r2, or any other URI) is used
>>> instead of r1 and r2. The nodes r1 and r2 are said to be blended
>>> (ideas
>>> for a better term, anyone?). The resulting graphs are merged
>>> (avoiding
>>> accidental blank node identification, see RDF 1.1 Semantics[2]). The
>>> blending function b maps the input graphs to the set of all possible
>>> output graphs.
>>>
>>> I wonder why you do not frame the problem more in terms of
>>> description
>>> logics. "Blending" as you have described it seems to be a much
>>> looser
>>> (flexible?) way of finding the intersection of 2 classes. If the
>>> provider has defined the classes of goods/services they can provide,
>>> and
>>> the consumer defines the class(es) he wishes to buy, then before
>>> they
>>> can do business they must find an instance of good or service that
>>> falls
>>> in the intersection of those classes.
>>>
>>> Suppose the provider defines class P, the consumer class C. Then any
>>> transaction between them would be in the class defined by
>>> intersectionOf(P C). If this class is unsatisfiable (cannot have any
>>> members, for instance if P has :price = 20 and C has :price < 15),
>>> you
>>> want an algorithm to define and rank ways in which the provider's
>>> and
>>> consumer's class definitions could be modified to make the
>>> intersection
>>> satisfiable. These "negotiating rules" (or "distance measurements")
>>> are
>>> the interesting part of the problem.
>>>
>>> I do not know if there are implementable solutions to these problems
>>> in
>>> the description logics literature, but it might be worth a little
>>> searching before building the graph-manipulation approach you have
>>> outlined here.
>>>
>>> Regards,
>>> --Paul
>>>
>>> ## Discussion
>>> There are many ways to blend two graphs, and none of these ways are
>>> deemed correct or incorrect.
>>> We require this operation to combine two data structures held by
>>> mutually independent agents that have certain expectations about the
>>> blending solution. These expectations divide the set of possible
>>> outcomes at least in acceptable and unacceptable ones, possibly even
>>> in
>>> a preference ranking. I'll try to formulate that as well in the
>>> following.
>>>
>>> # Node blendability
>>> ## Rationale
>>> One way of constraining the result is to restrict which nodes can be
>>> blended. So far, we have identified three types: variable nodes,
>>> constant nodes, and unblendable nodes. Each node in an RDF graph has
>>> exactly one of the three types. A variable node may be blended and
>>> its
>>> name may be changed. A constant node may also be blended, but its
>>> name
>>> is not allowed to change. An unblendable node must not be blended.
>>>
>>> One may note that blending g1 and g2 is equivalent to merging them
>>> if no
>>> variable nodes are present in either graph.
>>>
>>> It is currently unclear how the constant/variable/unblendable
>>> property
>>> should be expressed. My intuition is that in most practical use
>>> cases,
>>> most nodes need to be unblendable (ontology/TBox resources,
>>> properties
>>> in general), therefore it would be more economical to assume nodes
>>> to be
>>> unblendable by default. Then, variable/constant nodes have to be
>>> annotated as such explicitly.
>>> Maybe it would be an option to make definitions by URI prefix, so we
>>> can
>>> define a complete vocabulary as unblendable with one statement.
>>> (opinions on that?)
>>>
>>> ## Definition
>>> Given two graphs g1 and g2, a blendability-aware blending function
>>> is a
>>> blending function for g1 and g2 that only replaces resources that
>>> have
>>> not been declared as constant with respect to blending. The latter
>>> is
>>> done by specifying a triple "<graph> bl:containsConstant <resource>
>>> ."
>>> or "<graph> bl:containsVariable <resource> ." in an additional graph
>>> passed to the blendability-aware blending function. Such a graph is
>>> called a 'blendability graph'.
>>>
>>> # Constraints on the blending solution:
>>> ## Rationale
>>> As explained earlier, there may be multiple parties that have
>>> expectations for the blending solution. So far, we have talked about
>>> the
>>> originators of the data. There may also be one or more intended
>>> consumers of the data. Nothing prevents the expectations from being
>>> mutually incompatible (meaning that there exists no graph that
>>> satisfies
>>> them all) or that they are even unsatisfiable by themselves. It
>>> would of
>>> course be beneficial if such cases could be detected automaticall,
>>> but
>>> for now, let's just leave it at noting the possibility. Now, if the
>>> information the agents provided, when considered in total, has
>>> spurious
>>> or conflicting information, or if information is missing (taking
>>> their
>>> respective expectations as a definition of what is acceptable), the
>>> blending solution cannot meet the expectations. Nevertheless, it has
>>> to
>>> be computed, in order to be able to take the next step and "repair"
>>> (better word, please) one or both graphs and try again.
>>>
>>> It would be nice to use constraints for the following goals during
>>> the
>>> calculation of the blending solution:
>>> 1. pruning wholly unacceptable solutions
>>> 2. ranking unacceptable solutions by badness
>>> 3. ranking acceptable-looking, but incomplete solutions ("so far so
>>> good")
>>>
>>> Constraints should allow us to identify an acceptable result, or to
>>> select the best result of a number of unacceptable ones, and report
>>> them
>>> to users so that they can repair them.
>>>
>>> However, if the expectations define an acceptable result, anything
>>> else,
>>> however close, is unacceptable, and in the absence of acceptable
>>> results
>>> we are left with unacceptable ones, some of which may be completely
>>> bogus while others are almost acceptable. So, the three categories
>>> above
>>> cannot be separated, and the best we can hope for is reasonable
>>> ranking
>>> of unacceptable solutions.
>>>
>>> For the conceptualization, we go with a simple SHACL-based approach.
>>> Other systems (SPARQL, ShEx) are definitely possible as well.
>>>
>>> ## Definition
>>> The constraints-aware blending function is a blending function for
>>> two
>>> RDF graphs g1 and g2 and takes optional graph names s1,...,sN that
>>> denote graphs which contain SHACL shapes constraining the result.
>>> Each
>>> blending solution is validated using s1,...,sN; the result of the
>>> function is a mapping of the blending solutions to their respective
>>> SHACL validation results, indexed by shape graph name (s1,...,sN). A
>>> blending solution is called accepted with respect to a constraints
>>> graph
>>> s if it causes no SHACL validation results when validated against s.
>>> A
>>> blending solution is called accepted if it is accepted with respect
>>> to
>>> all specified shapes graphs.
>>>
>>> ## Discussion
>>> The main issue in this step seems to be the pruning and ordering of
>>> unacceptable results. Blending functions may choose to drop blending
>>> solutions if they are deemed too bad - but where should they draw
>>> the
>>> line and still report an unacceptable blending solution while
>>> dropping
>>> another?
>>>
>>> With respect to one shapes graph, we could define that validation
>>> result
>>> graphs are ranked by maximum severity (1 sh:Violation is worse than
>>> 1000
>>> sh:Warnings), and within those classes by number of validation
>>> results
>>> (2 Violations and 1 Warnings is worse than 1 Violation and 1000
>>> Warnings).
>>>
>>> I'm suspecting that the validation results should allow us to prune
>>> at
>>> least some bad blending solutions if their validation results are
>>> supersets of other solutions' validation results: if solution A has
>>> validation results (R1) and solution B has validation result (R1,
>>> R2) we
>>> can say that solution B is a deterioration of solution A and should
>>> not
>>> be reported.
>>>
>>> In addition to constraint-based pruning and ranking, it is of course
>>> possible to perform reasoning-based pruning and validate the
>>> blending
>>> solution according to appropriate reasoning logics (RDFS, OWL, ...).
>>> However, I am not sure as to the usefulness of this approach if no
>>> solution is found that is consistent according to the logic.
>>>
>>> # Graph blending algorithms
>>> Let's assume for this section that we are blending two graphs g1 and
>>> g2,
>>> and we know we have variable nodes var(g1) and constant nodes
>>> const(g1)
>>> in g1, and var(g2) and const(g2) in g2.
>>>
>>> ## Complete
>>> The complete algorithm consists of enumerating all possible
>>> solutions
>>> For each n in var(g1), there are the following solutions:
>>> * 1 solution in which n is not blended at all
>>> * 1 solution for each m in var(g2), in which n is blended with m
>>> * 1 solution for each m in const(g2), in which n is blended with m
>>> symmetrically the same is true for each n in var(g2).
>>> Removal of duplicate solutions yields the set of solutions.
>>>
>>> ## Heuristic
>>> We may interpret blending as a heuristic search and apply an A*-like
>>> algorithm to finding good results, assessing the distance to the
>>> goal by
>>> evaluating the constraints, and the distance from the start by the
>>> number of node blendings.
>>>
>>> ## Knowledge-based
>>> Exploiting knowledge about the actual structures in g1 and g2, the
>>> heuristic approach could be enhanced to apply more appropriate
>>> distance
>>> estimates and select more promising paths first.
>>>
>>> Links:
>>> 1.
>>> https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html
>>> [2]
>>> [1]
>>> 2. https://www.w3.org/TR/rdf11-mt/ [3] [2]
>>>
>>> Am 2017-11-16 19:37, schrieb Florian Kleedorfer:
>>> Hi,
>>>
>>> As you may recall, I've asked this list about
>>> messaging/negotiating
>>> using linked data[1], and it led me and my colleagues to a concept
>>> of
>>> RDF-based agreements[2]. (Thanks again for your input!)
>>>
>>> Now it's time for us to define how the negotiating parties come up
>>> with the data (set of rdf triples) that they agree on.
>>>
>>> One Idea we have is that both particpants (P1,P2) specify 'goals',
>>> each consisting of a SHACL shapes graph and a data graph. Two
>>> goals
>>> can be tested for compatibility by merging the two data graphs and
>>> then testing if the resulting graph is valid given the shapes
>>> defined
>>> by both goals. The problem we face with this approach is merging
>>> the
>>> two data graphs. Consider the example in this gist:
>>>
>>> https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332
>>> [4]
>>>
>>> [3]
>>>
>>> In that example, we would want to merge these two graphs:
>>>
>>> ex1:p1g-data {
>>> ex1:ride1 a taxi:Ride .
>>> ex1:ride1 taxi:hasDriver ex1:p1 .
>>> ex1:ride1 taxi:hasPickupLocation ex1:pickup1;
>>> }
>>>
>>> and
>>>
>>> ex2:p2g-data {
>>> ex2:myRide a taxi:Ride.
>>> ex2:myRide taxi:hasClient ex2:p2 .
>>> ex2:myRide taxi:hasPickupLocation ex2:myPickupLocation .
>>> ex2:myPickupLocation a s:GeoCoordinates ;
>>> s:latitude   "48.213814" ;
>>> s:longitude  "16.340870" .
>>> }
>>>
>>> We want that merge to happen in such a way that the shapes in
>>> ex1:p1g-shapes and ex2:p2g-shapes can be evaluated, which will
>>> require
>>> to merge some of the nodes. (In the example, we'll find out that a
>>> taxi:hasPickupTime property is missing. One of the participants is
>>> then to add that property, and the whole structure becomes valid
>>> according to both goals' shapes.)
>>>
>>> The question is how exactly to achieve that merge. Intuitively,
>>> the
>>> result in this case should be
>>>
>>> :mergedNode1 a taxi:Ride .
>>> :mergedNode1 taxi:hasDriver ex1:p1 . # note: p1 links her own
>>> identifier to the structure
>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2;
>>> :mergedNode1 a taxi:Ride.
>>> :mergedNode1 taxi:hasClient ex2:p2 .
>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2 .
>>> :mergedNode2 a s:GeoCoordinates ;
>>> s:latitude   "48.213814" ;
>>> s:longitude  "16.340870" .
>>>
>>> and after removal of duplicate triples
>>>
>>> :mergedNode1 a taxi:Ride .
>>> :mergedNode1 taxi:hasDriver ex1:p1 .
>>> :mergedNode1 taxi:hasClient ex2:p2 .
>>> :mergedNode1 taxi:hasPickupLocation :mergedNode2 .
>>> :mergedNode2 a s:GeoCoordinates ;
>>> s:latitude   "48.213814" ;
>>> s:longitude  "16.340870" .
>>>
>>> (I currently don't care about the URIs of the mergedNodes - just
>>> something one can mint when doing the merge)
>>>
>>> Do you have any hints on how to perform such a merge reliably,
>>> also
>>> with more complex structures?
>>>
>>> Thank you very much in advance!
>>>
>>> Best regards,
>>> Florian
>>>
>>> Links:
>>> 1. Negotiations discussion thread:
>>>
>>> https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html
>>> [5]
>>> [4]
>>> 2. Agreements paper:
>>> http://ceur-ws.org/Vol-1934/contribution-07.pdf [6] [5]
>>>
>>
>> Links:
>> ------
>> [1]
>> https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html
>> [2]
>> [2] https://www.w3.org/TR/rdf11-mt/ [3]
>> [3]
>> https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332
>> [4]
>> [4]
>> https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html
>> [5]
>> [5] http://ceur-ws.org/Vol-1934/contribution-07.pdf [6]
>>
>>
>>
>> Links:
>> ------
>> [1] https://en.wikipedia.org/wiki/Geohash
>> [2] https://lists.w3.org/Archives/Public/semantic-web/2017Nov/0067.html
>> [3] https://www.w3.org/TR/rdf11-mt/
>> [4] https://gist.github.com/fkleedorfer/81b32e3235105a4a4743e9fa76ba9332
>> [5] https://lists.w3.org/Archives/Public/semantic-web/2017Jul/0004.html
>> [6] http://ceur-ws.org/Vol-1934/contribution-07.pdf
>>
>
Received on Tuesday, 21 November 2017 16:23:17 UTC