Re: SPARQL subset as a PATCH format for LDP from Sandro Hawke on 2014-07-27 (public-ldp-wg@w3.org from July 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Sat, 26 Jul 2014 23:35:47 -0400
To: Alexandre Bertails <alexandre@bertails.org>
CC: ashok.malhotra@oracle.com, "public-ldp-wg@w3.org" <public-ldp-wg@w3.org>
Message-ID: <53D47393.3010506@w3.org>
On 07/26/2014 10:20 PM, Alexandre Bertails wrote:
> On Sat, Jul 26, 2014 at 5:59 PM, Sandro Hawke <sandro@w3.org> wrote:
>> On 07/26/2014 02:55 PM, Alexandre Bertails wrote:
>>> On Sat, Jul 26, 2014 at 1:52 PM, Sandro Hawke <sandro@w3.org> wrote:
>>>> On 07/26/2014 01:44 PM, Ashok Malhotra wrote:
>>>>> Hi Sandro:
>>>>> Thanks for the pointers.  I read some of the mail and the conclusion I
>>>>> came
>>>>> to seems a bit different from what you concluded.  I did not see a big
>>>>> push for
>>>>> SPARQL.  Instead I found from
>>>>> http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jul/0206.html:
>>>>>
>>>>> "The other possibilities, no matter what the outcome of the workshop,
>>>>> *are*
>>>>> ready to be standardized and I rather suspect some work on combining the
>>>>> best elements of each will get us much further, must faster than trying
>>>>> to
>>>>> mature ShEx."
>>>>>
>>>>> So, this argues for leading with existing solutions, ICV and SPIN,
>>>>> rather
>>>>> than
>>>>> with ShEX because the other solution have some implementation and
>>>>> experience
>>>>> behind them.  Makes perfect sense.
>>>>>
>>>>> But the PATCH case seems to be different as AFAIK there are no other
>>>>> existing
>>>>> solutions.
>>> We can always argue if they are suitable for the problem, but other
>>> existing/potential solutions include: SPARQL Update in full, 2 subsets
>>> of SPARQL Update, and RDF Patch + skolemization.
>>>
>>>> Isn't SPARQL UPDATE an existing solution for PATCH?
>>>>
>>>> It serves the basic purpose, although it has some drawbacks, like bad
>>>> worst-case performance and being fairly hard to implement.
>>>>
>>>> Those same things, however, could quite reasonably be said about ICV and
>>>> SPIN.
>>> I don't know about ICV, SPIN or ShEx (ok, just a little bit, maybe).
>>
>> To be clear, they are only relevant as another example of how inventing
>> something which could be done by SPARQL (even if painfully) gets a lot of
>> pushback.
> Have you considered that the pushback _could_ be justified?
>
> For example, I really like SPARQL, for several reasons, but as I have
> explained, I really think it is not appropriate as a PATCH format for
> LDP.
>
>>
>>>    I just have two remarks:
>>>
>>> * SPARQL Update as a whole was developed for RDF databases, namely
>>> quad stores, with expressive power from the rest of SPARQL. I don't
>>> know if it was designed with use-cases as in RDF Validation, but I do
>>> know it was not designed for the use-case of updating LDP-RS on the
>>> LDP platform.
>>> * building a technology on top of an existing one is something I tend
>>> to prefer whenever it makes sense. But in our case, we are talking
>>> about taking the subset of an existing language, while remaining
>>> compatible with it. This is *not* as easy as it seems at first.
>>>
>>> I would prefer to hear about concrete proposals on how to do that. As
>>> somebody who _cannot_ rely on an existing SPARQL implementations, and
>>> who is not planning to implement one in full for that use-case, I
>>> would like to see a concrete syntax written down, with a formal
>>> semantics for it.
>>
>> Okay, I'm going to make two concrete proposals.
>>
>> 1.  Just use SPARQL 1.1 Update.   The whole thing.   I know it doesn't
>> handle lists well.  What else is wrong with it?  Why can you not use it?
> I became interested in LDP because it was the first time RDF was
> becoming a first-class citizen of the Web, by that I mean applications
> can interact (read/write) *directly* with RDF resources using HTTP,
> without being behind an endpoint. That's what we meant by LDP being
> the intersection of RDF and REST.
>
> The W3C has finally recognized a few years ago that native RDF was not
> the only use-case for RDF applications. You can now have a relational
> database (RDB2RDF), CSV files (RDF for Tabular Data), XML (GRDDL,
> XSLT), etc. But not necessarily a triple/quad store. For example, at
> the company I work for, we have several (ie. physically disconnected)
> Datomic and Cassandra servers, and we are now exposing some of the
> data behind LDP, with the objective of doing for all of our data. In
> all those cases, we want to expose and link our data on the Web, like
> all those so-called RESTful APIs, but in a more consistent way, and
> using RDF as the model and the exchange data format. Hence LDP, and
> not yet-another-web-api.
>
> The reason I am telling you all that is that supporting SPARQL for
> those virtual RDF datasets is not that easy (when possible) when you
> don't have a quadstore as your backend. Reverse mapping for simple
> SPARQL queries is hard. And SPARQL Update is even worse to support.
> Basically, forcing SPARQL Update onto LDP facing applications for
> simple resource updates on single LDP-RS (ie. PATCH) is like using a
> hammer to kill a fly.
>
> So full SPARQL Update is simply a no-go for me. I just cannot support
> it completely, as some features cannot correctly be mapped to Datomic
> and Cassandra.

So this is the key.   You want to be able to support PATCH on databases 
that are not materialized as either triples OR as SQL.

If the database was SQL, then (as I understand it), SPARQL Update would 
be okay, because it can be mapped to SQL.

But you don't know how to map SPARQL Update to NoSQL databases, or it's 
just too much work.

I take it you do know how to map LD-Patch to Cassandra and Datomic?

[ BTW, Datomic sounds awesome.  Is it as fun to use as I'd imagine? ]


>
> Also, if I was in a case where SPARQL Update was ok for me to use
> (it's not), then I suspect that I wouldn't need LDP at all, and SPARQL
> + Update + Graph Store protocol would just be enough. And there is
> nothing preventing one from using SPARQL Update right now. Just don't
> call it LD Patch.

It's not about what's called what, it's about what we promote as the the 
PATCH format.   If we had a simple enough PATCH format, then we could 
possibly make it a MUST to implement in the next version of LDP.

I don't think SPARQL Update is simple enough for that, but my prediction 
is the LD-Patch will turn out, sadly, to not be either.


>> 2.  Use SPARQL 1.1 Update with an extension to handle lists well.
>> Specifically, it would be a slice function, usable in FILTER and especially
>> in BIND.   This seems like a no-brainer to include in SPARQL 1.2.  I'd want
>> to talk to a few of the SPARQL implementers and see if they're up for adding
>> it.    Maybe a full set of list functions like [1].
> Sorry but I don't know RIF and your idea is still very vague for me. I
> understand how you can provide new functions for matching nodes in an
> rdf:list but I fail to see how this plays in a SPARQL Update query.
>
> Can you just provide some examples where you are doing the equivalent
> of that python code (I know read python):

Probably not worthwhile to go into this now, given your veto on SPARQL.


> [[
>>>> l = [1,2,3,4,5,6,7,8,9,10]
>>>> l[2:2] = [11,12]
>>>> l[2:7] = [13,14]
>>>> l[2:] = [15,16]
>>>> l.append(17)
> ]]
>
>> If we want a subset, we could define it purely by restricting the grammar --
>> eg leaving out the stuff that does federation, negation, aggregation, --
>> with no need to say anything about the semantics except they are the same as
>> SPARQL.   Until I hear what the problem is with SPARQL, though, I don't want
>> to start excluding stuff.
> Am I the only one thinking that "no need to say anything about the
> semantics except they are the same as SPARQL" is just plain wrong?
>
> I mean, would we really tell implementers and users of the technology
> that they have to go learn SPARQL before they can start understanding
> what subset correctly apply to LD Patch? And how? And would they still
> need to carry this ResultSet semantics over while a lot of us would
> explicitly prefer avoiding it?

I think the users who are writing PATCHes by hand will be familiar with 
SPARQL.  And if they are not, there are lots of other reasons to learn it.

Contrast that with LD-Patch, for which there is no other reason it.

You seem to think LD-Patch's syntax and semantics are easy.   I don't 
think they are.   Maybe if you expanded the path syntax only many rows 
it would be more clear what it means.

I can't help but regret again that we didn't chose to use TurtlePatch 
(which I first wrote on your wall, the week after the workshop - even if 
I didn't figure out how to handle bnodes until this year).   
https://www.w3.org/2001/sw/wiki/TurtlePatch

        -- Sandro

>
> Alexandre
>
>>        -- Sandro
>>
>>
>> [1] http://www.w3.org/TR/rif-dtb/#Functions_and_Predicates_on_RIF_Lists
>>
>>
>>
>>> Alexandre
>>>
>>>>         -- Sandro
>>>>
>>>>
>>>>> All the best, Ashok
>>>>> On 7/26/2014 6:10 AM, Sandro Hawke wrote:
>>>>>> On July 25, 2014 2:48:28 PM EDT, Alexandre Bertails
>>>>>> <alexandre@bertails.org> wrote:
>>>>>>> On Fri, Jul 25, 2014 at 11:51 AM, Ashok Malhotra
>>>>>>> <ashok.malhotra@oracle.com> wrote:
>>>>>>>> Alexandre:
>>>>>>>> The W3C held a RDF Validation Workshop last year.
>>>>>>>> One of the questions that immediately came up was
>>>>>>>> "We can use SPARQL to validate RDF".  The answer was
>>>>>>>> that SPARQL was to complex and too hard to learn.
>>>>>>>> So, we compromised and the workshop recommended
>>>>>>>> that a new RDF validation language should be developed
>>>>>>>> to cover the simple cases and SPARQL could be used when
>>>>>>>> things got complex.
>>>>>>>>
>>>>>>>> It seems to me that you can make a similar argument
>>>>>>>> for RDF Patch.
>>>>>>> I totally agree with that.
>>>>>>>
>>>>>> Thanks for bringing this up, Ashok.    I'm going to use the same
>>>>>> situation to argue the opposite.
>>>>>>
>>>>>> It's relatively easy for a group of people, especially at a face to
>>>>>> face
>>>>>> meeting, too come to the conclusion SPARQL is too hard to learn and we
>>>>>> should invent something else.    But when we took it to the wider
>>>>>> world, we
>>>>>> got a reaction that's so strong it's hard not to characterize as
>>>>>> violent.
>>>>>>
>>>>>> You might want to read:
>>>>>>
>>>>>>
>>>>>> http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Jul/thread.html
>>>>>>
>>>>>> Probably the most recent ones right now give a decent summary and you
>>>>>> don't have to read them all.
>>>>>>
>>>>>> I have lots of theories to explain the disparity.   Like: people who
>>>>>> have
>>>>>> freely chosen to join an expedition are naturally more inclined to go
>>>>>> somewhere interesting.
>>>>>>
>>>>>> I'm not saying we can't invent something new, but be sure to understand
>>>>>> the battle to get it standardized may be harder than just implementing
>>>>>> SPARQL everywhere.
>>>>>>
>>>>>>         - Sandro
>>>>>>
>>>>>>> Alexandre
>>>>>>>
>>>>>>>> All the best, Ashok
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/25/2014 9:34 AM, Alexandre Bertails wrote:
>>>>>>>>> On Fri, Jul 25, 2014 at 8:04 AM, John Arwe <johnarwe@us.ibm.com>
>>>>>>> wrote:
>>>>>>>>>>> Another problem is the support for rdf:list. I have just finished
>>>>>>>>>>> writing down the semantics for UpdateList and based on that
>>>>>>>>>>> experience, I know this is something I want to rely on as a user,
>>>>>>>>>>> because it is so easy to get it wrong, so I want native support
>>>>>>> for
>>>>>>>>>>> it. And I don't think it is possible to do something equivalent in
>>>>>>>>>>> SPARQL Update. That is a huge drawback as list manipulation (eg.
>>>>>>> in
>>>>>>>>>>> JSON-LD, or Turtle) is an everyday task.
>>>>>>>>>> Is semantics for UpdateList  (that you wrote down) somewhere that
>>>>>>> WG
>>>>>>>>>> members
>>>>>>>>>> can look at it, and satisfy themselves that they agree with your
>>>>>>>>>> conclusion?
>>>>>>>>> You can find the semantics at [1]. Even if still written in Scala
>>>>>>> for
>>>>>>>>> now, this is written in a (purely functional) style, which is very
>>>>>>>>> close to the formalism that will be used for the operational
>>>>>>> semantics
>>>>>>>>> in the spec. Also, note that this is the most complex part of the
>>>>>>>>> entire semantics, all the rest being pretty simple, even Paths. And
>>>>>>> I
>>>>>>>>> spent a lot of time finding the general solution while breaking it
>>>>>>> in
>>>>>>>>> simpler sub-parts.
>>>>>>>>>
>>>>>>>>> In a nutshell, you have 3 steps: first you move to the left bound,
>>>>>>>>> then you gather triples to delete until the right bound, and you
>>>>>>>>> finally insert the new triples in the middle. It's really tricky
>>>>>>>>> because 1. you want to minimize the number of operations, even if
>>>>>>> this
>>>>>>>>> is only a spec 2. unlike usual linked lists with pointers, you
>>>>>>>>> manipulate triples, so the pointer in question is only the node in
>>>>>>> the
>>>>>>>>> object position in the triple, and you need to remember and carry
>>>>>>> the
>>>>>>>>> corresponding subject-predicate 3. interesting (ie. weird) things
>>>>>>> can
>>>>>>>>> happen at the limits of the list if you don't pay attention.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>> https://github.com/betehess/banana-rdf/blob/ldpatch/patch/src/main/scala/Semantics.scala#L62
>>>>>>>>>> I'm not steeped enough in the intracacies of SPARQL Update to have
>>>>>>> a
>>>>>>>>>> horse
>>>>>>>>>> in this race, but if this issue is the big-animal difference then
>>>>>>> people
>>>>>>>>>> with the necessary understanding are going to want to see the
>>>>>>> details.
>>>>>>>>>> The
>>>>>>>>>> IBM products I'm aware of eschew rdf:List (and blank nodes
>>>>>>> generally, to
>>>>>>>>>> first order), so I don't know how much this one alone would sway
>>>>>>> me.
>>>>>>>>> You _could_ generate a SPARQL Update query that would do something
>>>>>>>>> equivalent. But you'd have to match and remember the intermediate
>>>>>>>>> nodes/triples.
>>>>>>>>>
>>>>>>>>> JSON-LD users manipulate lists on a day-to-day basis. Without native
>>>>>>>>> support for rdf:list in LD Patch, I would turn to JSON PATCH to
>>>>>>>>> manipulate those lists.
>>>>>>>>>
>>>>>>>>>> It sounds like the other big-animal difference in your email is
>>>>>>>>>>
>>>>>>>>>>> we would have to refine the SPARQL semantics so that the order of
>>>>>>> the
>>>>>>>>>>> clauses matters (ie. no need to depend on a query optimiser). And
>>>>>>> we
>>>>>>>>>> That sounds like a more general problem.  It might mean, in effect,
>>>>>>> that
>>>>>>>>>> no
>>>>>>>>>> one would be able to use existing off-the-shelf componentry (specs
>>>>>>> & code
>>>>>>>>>> ... is that the implication, Those Who Know S-U?) and that might
>>>>>>> well be
>>>>>>>>>> a
>>>>>>>>>> solid answer to "why not [use S-U]?"
>>>>>>>>> The fact that reordering the clauses doesn't change the semantics is
>>>>>>> a
>>>>>>>>> feature of SPARQL. It means that queries can be rearranged for
>>>>>>>>> optimisation purposes. But you never know if the execution plan will
>>>>>>>>> be the best one, and you can end up with huge intermediate result
>>>>>>>>> sets.
>>>>>>>>>
>>>>>>>>> In any case, if we ever go down the SPARQL Update way, I will ask
>>>>>>> that
>>>>>>>>> we specify that clauses are executed in order, or something like
>>>>>>> that.
>>>>>>>>> And I will ask for a semantics that doesn't rely on result sets if
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>>> Were there any other big-animal issues you found, those two aside?
>>>>>>>>> A big issue for me will be to correctly explain the subset of SPARQL
>>>>>>>>> we would be considering, and its limitations compared to its big
>>>>>>>>> brother.
>>>>>>>>>
>>>>>>>>> Also, if you don't implement it from scratch and want to rely on an
>>>>>>>>> existing implementation, you would still have to reject all the
>>>>>>>>> correct SPARQL queries, and that can be tricky too, because you have
>>>>>>>>> to inspect the query after it is parsed. Oh, and I will make sure
>>>>>>>>> there are tests rejecting such queries :-)
>>>>>>>>>
>>>>>>>>> Alexandre
>>>>>>>>>
>>>>>>>>>> Best Regards, John
>>>>>>>>>>
>>>>>>>>>> Voice US 845-435-9470  BluePages
>>>>>>>>>> Cloud and Smarter Infrastructure OSLC Lead
>>>>>>>>>>
>>>>>
Received on Sunday, 27 July 2014 03:35:59 UTC