Re: SPARQL subset as a PATCH format for LDP from Andrei Sambra on 2014-07-27 (public-ldp-wg@w3.org from July 2014)

From: Andrei Sambra <andrei.sambra@gmail.com>
Date: Sun, 27 Jul 2014 09:37:31 -0400
To: Sandro Hawke <sandro@w3.org>
Cc: Alexandre Bertails <alexandre@bertails.org>, ashok malhotra <ashok.malhotra@oracle.com>, "public-ldp-wg@w3.org" <public-ldp-wg@w3.org>
Message-ID: <CAFG79eiwYV3S-mGjbHLA1Pd90i-Ea0K8LmqDiBcXuZ_kFV2DaA@mail.gmail.com>
On Sat, Jul 26, 2014 at 11:35 PM, Sandro Hawke <sandro@w3.org> wrote:

> On 07/26/2014 10:20 PM, Alexandre Bertails wrote:
>
>> On Sat, Jul 26, 2014 at 5:59 PM, Sandro Hawke <sandro@w3.org> wrote:
>>
>>> On 07/26/2014 02:55 PM, Alexandre Bertails wrote:
>>>
>>>> On Sat, Jul 26, 2014 at 1:52 PM, Sandro Hawke <sandro@w3.org> wrote:
>>>>
>>>>> On 07/26/2014 01:44 PM, Ashok Malhotra wrote:
>>>>>
>>>>>> Hi Sandro:
>>>>>> Thanks for the pointers.  I read some of the mail and the conclusion I
>>>>>> came
>>>>>> to seems a bit different from what you concluded.  I did not see a big
>>>>>> push for
>>>>>> SPARQL.  Instead I found from
>>>>>> http://lists.w3.org/Archives/Public/public-rdf-shapes/
>>>>>> 2014Jul/0206.html:
>>>>>>
>>>>>> "The other possibilities, no matter what the outcome of the workshop,
>>>>>> *are*
>>>>>> ready to be standardized and I rather suspect some work on combining
>>>>>> the
>>>>>> best elements of each will get us much further, must faster than
>>>>>> trying
>>>>>> to
>>>>>> mature ShEx."
>>>>>>
>>>>>> So, this argues for leading with existing solutions, ICV and SPIN,
>>>>>> rather
>>>>>> than
>>>>>> with ShEX because the other solution have some implementation and
>>>>>> experience
>>>>>> behind them.  Makes perfect sense.
>>>>>>
>>>>>> But the PATCH case seems to be different as AFAIK there are no other
>>>>>> existing
>>>>>> solutions.
>>>>>>
>>>>> We can always argue if they are suitable for the problem, but other
>>>> existing/potential solutions include: SPARQL Update in full, 2 subsets
>>>> of SPARQL Update, and RDF Patch + skolemization.
>>>>
>>>>  Isn't SPARQL UPDATE an existing solution for PATCH?
>>>>>
>>>>> It serves the basic purpose, although it has some drawbacks, like bad
>>>>> worst-case performance and being fairly hard to implement.
>>>>>
>>>>> Those same things, however, could quite reasonably be said about ICV
>>>>> and
>>>>> SPIN.
>>>>>
>>>> I don't know about ICV, SPIN or ShEx (ok, just a little bit, maybe).
>>>>
>>>
>>> To be clear, they are only relevant as another example of how inventing
>>> something which could be done by SPARQL (even if painfully) gets a lot of
>>> pushback.
>>>
>> Have you considered that the pushback _could_ be justified?
>>
>> For example, I really like SPARQL, for several reasons, but as I have
>> explained, I really think it is not appropriate as a PATCH format for
>> LDP.
>>
>>
>>>     I just have two remarks:
>>>>
>>>> * SPARQL Update as a whole was developed for RDF databases, namely
>>>> quad stores, with expressive power from the rest of SPARQL. I don't
>>>> know if it was designed with use-cases as in RDF Validation, but I do
>>>> know it was not designed for the use-case of updating LDP-RS on the
>>>> LDP platform.
>>>> * building a technology on top of an existing one is something I tend
>>>> to prefer whenever it makes sense. But in our case, we are talking
>>>> about taking the subset of an existing language, while remaining
>>>> compatible with it. This is *not* as easy as it seems at first.
>>>>
>>>> I would prefer to hear about concrete proposals on how to do that. As
>>>> somebody who _cannot_ rely on an existing SPARQL implementations, and
>>>> who is not planning to implement one in full for that use-case, I
>>>> would like to see a concrete syntax written down, with a formal
>>>> semantics for it.
>>>>
>>>
>>> Okay, I'm going to make two concrete proposals.
>>>
>>> 1.  Just use SPARQL 1.1 Update.   The whole thing.   I know it doesn't
>>> handle lists well.  What else is wrong with it?  Why can you not use it?
>>>
>> I became interested in LDP because it was the first time RDF was
>> becoming a first-class citizen of the Web, by that I mean applications
>> can interact (read/write) *directly* with RDF resources using HTTP,
>> without being behind an endpoint. That's what we meant by LDP being
>> the intersection of RDF and REST.
>>
>> The W3C has finally recognized a few years ago that native RDF was not
>> the only use-case for RDF applications. You can now have a relational
>> database (RDB2RDF), CSV files (RDF for Tabular Data), XML (GRDDL,
>> XSLT), etc. But not necessarily a triple/quad store. For example, at
>> the company I work for, we have several (ie. physically disconnected)
>> Datomic and Cassandra servers, and we are now exposing some of the
>> data behind LDP, with the objective of doing for all of our data. In
>> all those cases, we want to expose and link our data on the Web, like
>> all those so-called RESTful APIs, but in a more consistent way, and
>> using RDF as the model and the exchange data format. Hence LDP, and
>> not yet-another-web-api.
>>
>> The reason I am telling you all that is that supporting SPARQL for
>> those virtual RDF datasets is not that easy (when possible) when you
>> don't have a quadstore as your backend. Reverse mapping for simple
>> SPARQL queries is hard. And SPARQL Update is even worse to support.
>> Basically, forcing SPARQL Update onto LDP facing applications for
>> simple resource updates on single LDP-RS (ie. PATCH) is like using a
>> hammer to kill a fly.
>>
>> So full SPARQL Update is simply a no-go for me. I just cannot support
>> it completely, as some features cannot correctly be mapped to Datomic
>> and Cassandra.
>>
>
> So this is the key.   You want to be able to support PATCH on databases
> that are not materialized as either triples OR as SQL.
>
> If the database was SQL, then (as I understand it), SPARQL Update would be
> okay, because it can be mapped to SQL.
>
> But you don't know how to map SPARQL Update to NoSQL databases, or it's
> just too much work.
>
> I take it you do know how to map LD-Patch to Cassandra and Datomic?
>
> [ BTW, Datomic sounds awesome.  Is it as fun to use as I'd imagine? ]
>
>
>
>
>> Also, if I was in a case where SPARQL Update was ok for me to use
>> (it's not), then I suspect that I wouldn't need LDP at all, and SPARQL
>> + Update + Graph Store protocol would just be enough. And there is
>> nothing preventing one from using SPARQL Update right now. Just don't
>> call it LD Patch.
>>
>
> It's not about what's called what, it's about what we promote as the the
> PATCH format.   If we had a simple enough PATCH format, then we could
> possibly make it a MUST to implement in the next version of LDP.
>

I think Alexandre makes a valid point. For a spec (LDP) that explicitly
tried to avoid SPARQL, using this format for PATCH makes absolutely no
sense to me.


>
> I don't think SPARQL Update is simple enough for that, but my prediction
> is the LD-Patch will turn out, sadly, to not be either.
>
>
>
>  2.  Use SPARQL 1.1 Update with an extension to handle lists well.
>>> Specifically, it would be a slice function, usable in FILTER and
>>> especially
>>> in BIND.   This seems like a no-brainer to include in SPARQL 1.2.  I'd
>>> want
>>> to talk to a few of the SPARQL implementers and see if they're up for
>>> adding
>>> it.    Maybe a full set of list functions like [1].
>>>
>> Sorry but I don't know RIF and your idea is still very vague for me. I
>> understand how you can provide new functions for matching nodes in an
>> rdf:list but I fail to see how this plays in a SPARQL Update query.
>>
>> Can you just provide some examples where you are doing the equivalent
>> of that python code (I know read python):
>>
>
> Probably not worthwhile to go into this now, given your veto on SPARQL.
>
>
>
>  [[
>>
>>>  l = [1,2,3,4,5,6,7,8,9,10]
>>>>> l[2:2] = [11,12]
>>>>> l[2:7] = [13,14]
>>>>> l[2:] = [15,16]
>>>>> l.append(17)
>>>>>
>>>> ]]
>>
>>  If we want a subset, we could define it purely by restricting the
>>> grammar --
>>> eg leaving out the stuff that does federation, negation, aggregation, --
>>> with no need to say anything about the semantics except they are the
>>> same as
>>> SPARQL.   Until I hear what the problem is with SPARQL, though, I don't
>>> want
>>> to start excluding stuff.
>>>
>> Am I the only one thinking that "no need to say anything about the
>> semantics except they are the same as SPARQL" is just plain wrong?
>>
>> I mean, would we really tell implementers and users of the technology
>> that they have to go learn SPARQL before they can start understanding
>> what subset correctly apply to LD Patch? And how? And would they still
>> need to carry this ResultSet semantics over while a lot of us would
>> explicitly prefer avoiding it?
>>
>
> I think the users who are writing PATCHes by hand will be familiar with
> SPARQL.  And if they are not, there are lots of other reasons to learn it.
>

Except that LDP explicitly made a point to avoid SPARQL. Since the LDP
model is all about interacting with resources by using their individual
URIs, PATCH-ing resources through a SPARQL endpoint goes against the core
LDP believes.

-- Andrei


>
> Contrast that with LD-Patch, for which there is no other reason it.
>
> You seem to think LD-Patch's syntax and semantics are easy.   I don't
> think they are.   Maybe if you expanded the path syntax only many rows it
> would be more clear what it means.
>
> I can't help but regret again that we didn't chose to use TurtlePatch
> (which I first wrote on your wall, the week after the workshop - even if I
> didn't figure out how to handle bnodes until this year).
> https://www.w3.org/2001/sw/wiki/TurtlePatch
>
>        -- Sandro
>
>
>
>> Alexandre
>>
>>         -- Sandro
>>>
>>>
>>> [1] http://www.w3.org/TR/rif-dtb/#Functions_and_Predicates_on_RIF_Lists
>>>
>>>
>>>
>>>  Alexandre
>>>>
>>>>          -- Sandro
>>>>>
>>>>>
>>>>>  All the best, Ashok
>>>>>> On 7/26/2014 6:10 AM, Sandro Hawke wrote:
>>>>>>
>>>>>>> On July 25, 2014 2:48:28 PM EDT, Alexandre Bertails
>>>>>>> <alexandre@bertails.org> wrote:
>>>>>>>
>>>>>>>> On Fri, Jul 25, 2014 at 11:51 AM, Ashok Malhotra
>>>>>>>> <ashok.malhotra@oracle.com> wrote:
>>>>>>>>
>>>>>>>>> Alexandre:
>>>>>>>>> The W3C held a RDF Validation Workshop last year.
>>>>>>>>> One of the questions that immediately came up was
>>>>>>>>> "We can use SPARQL to validate RDF".  The answer was
>>>>>>>>> that SPARQL was to complex and too hard to learn.
>>>>>>>>> So, we compromised and the workshop recommended
>>>>>>>>> that a new RDF validation language should be developed
>>>>>>>>> to cover the simple cases and SPARQL could be used when
>>>>>>>>> things got complex.
>>>>>>>>>
>>>>>>>>> It seems to me that you can make a similar argument
>>>>>>>>> for RDF Patch.
>>>>>>>>>
>>>>>>>> I totally agree with that.
>>>>>>>>
>>>>>>>>  Thanks for bringing this up, Ashok.    I'm going to use the same
>>>>>>> situation to argue the opposite.
>>>>>>>
>>>>>>> It's relatively easy for a group of people, especially at a face to
>>>>>>> face
>>>>>>> meeting, too come to the conclusion SPARQL is too hard to learn and
>>>>>>> we
>>>>>>> should invent something else.    But when we took it to the wider
>>>>>>> world, we
>>>>>>> got a reaction that's so strong it's hard not to characterize as
>>>>>>> violent.
>>>>>>>
>>>>>>> You might want to read:
>>>>>>>
>>>>>>>
>>>>>>> http://lists.w3.org/Archives/Public/public-rdf-shapes/
>>>>>>> 2014Jul/thread.html
>>>>>>>
>>>>>>> Probably the most recent ones right now give a decent summary and you
>>>>>>> don't have to read them all.
>>>>>>>
>>>>>>> I have lots of theories to explain the disparity.   Like: people who
>>>>>>> have
>>>>>>> freely chosen to join an expedition are naturally more inclined to go
>>>>>>> somewhere interesting.
>>>>>>>
>>>>>>> I'm not saying we can't invent something new, but be sure to
>>>>>>> understand
>>>>>>> the battle to get it standardized may be harder than just
>>>>>>> implementing
>>>>>>> SPARQL everywhere.
>>>>>>>
>>>>>>>         - Sandro
>>>>>>>
>>>>>>>  Alexandre
>>>>>>>>
>>>>>>>>  All the best, Ashok
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7/25/2014 9:34 AM, Alexandre Bertails wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Jul 25, 2014 at 8:04 AM, John Arwe <johnarwe@us.ibm.com>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>  Another problem is the support for rdf:list. I have just finished
>>>>>>>>>>>> writing down the semantics for UpdateList and based on that
>>>>>>>>>>>> experience, I know this is something I want to rely on as a
>>>>>>>>>>>> user,
>>>>>>>>>>>> because it is so easy to get it wrong, so I want native support
>>>>>>>>>>>>
>>>>>>>>>>> for
>>>>>>>>
>>>>>>>>>  it. And I don't think it is possible to do something equivalent in
>>>>>>>>>>>> SPARQL Update. That is a huge drawback as list manipulation (eg.
>>>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>
>>>>>>>>>  JSON-LD, or Turtle) is an everyday task.
>>>>>>>>>>>>
>>>>>>>>>>> Is semantics for UpdateList  (that you wrote down) somewhere that
>>>>>>>>>>>
>>>>>>>>>> WG
>>>>>>>>
>>>>>>>>>  members
>>>>>>>>>>> can look at it, and satisfy themselves that they agree with your
>>>>>>>>>>> conclusion?
>>>>>>>>>>>
>>>>>>>>>> You can find the semantics at [1]. Even if still written in Scala
>>>>>>>>>>
>>>>>>>>> for
>>>>>>>>
>>>>>>>>> now, this is written in a (purely functional) style, which is very
>>>>>>>>>> close to the formalism that will be used for the operational
>>>>>>>>>>
>>>>>>>>> semantics
>>>>>>>>
>>>>>>>>> in the spec. Also, note that this is the most complex part of the
>>>>>>>>>> entire semantics, all the rest being pretty simple, even Paths.
>>>>>>>>>> And
>>>>>>>>>>
>>>>>>>>> I
>>>>>>>>
>>>>>>>>> spent a lot of time finding the general solution while breaking it
>>>>>>>>>>
>>>>>>>>> in
>>>>>>>>
>>>>>>>>> simpler sub-parts.
>>>>>>>>>>
>>>>>>>>>> In a nutshell, you have 3 steps: first you move to the left bound,
>>>>>>>>>> then you gather triples to delete until the right bound, and you
>>>>>>>>>> finally insert the new triples in the middle. It's really tricky
>>>>>>>>>> because 1. you want to minimize the number of operations, even if
>>>>>>>>>>
>>>>>>>>> this
>>>>>>>>
>>>>>>>>> is only a spec 2. unlike usual linked lists with pointers, you
>>>>>>>>>> manipulate triples, so the pointer in question is only the node in
>>>>>>>>>>
>>>>>>>>> the
>>>>>>>>
>>>>>>>>> object position in the triple, and you need to remember and carry
>>>>>>>>>>
>>>>>>>>> the
>>>>>>>>
>>>>>>>>> corresponding subject-predicate 3. interesting (ie. weird) things
>>>>>>>>>>
>>>>>>>>> can
>>>>>>>>
>>>>>>>>> happen at the limits of the list if you don't pay attention.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>>  https://github.com/betehess/banana-rdf/blob/ldpatch/patch/
>>>>>>>> src/main/scala/Semantics.scala#L62
>>>>>>>>
>>>>>>>>>  I'm not steeped enough in the intracacies of SPARQL Update to have
>>>>>>>>>>>
>>>>>>>>>> a
>>>>>>>>
>>>>>>>>>  horse
>>>>>>>>>>> in this race, but if this issue is the big-animal difference then
>>>>>>>>>>>
>>>>>>>>>> people
>>>>>>>>
>>>>>>>>>  with the necessary understanding are going to want to see the
>>>>>>>>>>>
>>>>>>>>>> details.
>>>>>>>>
>>>>>>>>>  The
>>>>>>>>>>> IBM products I'm aware of eschew rdf:List (and blank nodes
>>>>>>>>>>>
>>>>>>>>>> generally, to
>>>>>>>>
>>>>>>>>>  first order), so I don't know how much this one alone would sway
>>>>>>>>>>>
>>>>>>>>>> me.
>>>>>>>>
>>>>>>>>> You _could_ generate a SPARQL Update query that would do something
>>>>>>>>>> equivalent. But you'd have to match and remember the intermediate
>>>>>>>>>> nodes/triples.
>>>>>>>>>>
>>>>>>>>>> JSON-LD users manipulate lists on a day-to-day basis. Without
>>>>>>>>>> native
>>>>>>>>>> support for rdf:list in LD Patch, I would turn to JSON PATCH to
>>>>>>>>>> manipulate those lists.
>>>>>>>>>>
>>>>>>>>>>  It sounds like the other big-animal difference in your email is
>>>>>>>>>>>
>>>>>>>>>>>  we would have to refine the SPARQL semantics so that the order
>>>>>>>>>>>> of
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>
>>>>>>>>>  clauses matters (ie. no need to depend on a query optimiser). And
>>>>>>>>>>>>
>>>>>>>>>>> we
>>>>>>>>
>>>>>>>>>  That sounds like a more general problem.  It might mean, in
>>>>>>>>>>> effect,
>>>>>>>>>>>
>>>>>>>>>> that
>>>>>>>>
>>>>>>>>>  no
>>>>>>>>>>> one would be able to use existing off-the-shelf componentry
>>>>>>>>>>> (specs
>>>>>>>>>>>
>>>>>>>>>> & code
>>>>>>>>
>>>>>>>>>  ... is that the implication, Those Who Know S-U?) and that might
>>>>>>>>>>>
>>>>>>>>>> well be
>>>>>>>>
>>>>>>>>>  a
>>>>>>>>>>> solid answer to "why not [use S-U]?"
>>>>>>>>>>>
>>>>>>>>>> The fact that reordering the clauses doesn't change the semantics
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>> a
>>>>>>>>
>>>>>>>>> feature of SPARQL. It means that queries can be rearranged for
>>>>>>>>>> optimisation purposes. But you never know if the execution plan
>>>>>>>>>> will
>>>>>>>>>> be the best one, and you can end up with huge intermediate result
>>>>>>>>>> sets.
>>>>>>>>>>
>>>>>>>>>> In any case, if we ever go down the SPARQL Update way, I will ask
>>>>>>>>>>
>>>>>>>>> that
>>>>>>>>
>>>>>>>>> we specify that clauses are executed in order, or something like
>>>>>>>>>>
>>>>>>>>> that.
>>>>>>>>
>>>>>>>>> And I will ask for a semantics that doesn't rely on result sets if
>>>>>>>>>> possible.
>>>>>>>>>>
>>>>>>>>>>  Were there any other big-animal issues you found, those two
>>>>>>>>>>> aside?
>>>>>>>>>>>
>>>>>>>>>> A big issue for me will be to correctly explain the subset of
>>>>>>>>>> SPARQL
>>>>>>>>>> we would be considering, and its limitations compared to its big
>>>>>>>>>> brother.
>>>>>>>>>>
>>>>>>>>>> Also, if you don't implement it from scratch and want to rely on
>>>>>>>>>> an
>>>>>>>>>> existing implementation, you would still have to reject all the
>>>>>>>>>> correct SPARQL queries, and that can be tricky too, because you
>>>>>>>>>> have
>>>>>>>>>> to inspect the query after it is parsed. Oh, and I will make sure
>>>>>>>>>> there are tests rejecting such queries :-)
>>>>>>>>>>
>>>>>>>>>> Alexandre
>>>>>>>>>>
>>>>>>>>>>  Best Regards, John
>>>>>>>>>>>
>>>>>>>>>>> Voice US 845-435-9470  BluePages
>>>>>>>>>>> Cloud and Smarter Infrastructure OSLC Lead
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>
>
Received on Sunday, 27 July 2014 13:38:21 UTC