Re: SPARQL subset as a PATCH format for LDP from Steve Speicher on 2014-07-27 (public-ldp-wg@w3.org from July 2014)

From: Steve Speicher <sspeiche@gmail.com>
Date: Sun, 27 Jul 2014 10:30:08 -0400
To: Andrei Sambra <andrei.sambra@gmail.com>
Cc: Sandro Hawke <sandro@w3.org>, Alexandre Bertails <alexandre@bertails.org>, ashok malhotra <ashok.malhotra@oracle.com>, "public-ldp-wg@w3.org" <public-ldp-wg@w3.org>
Message-ID: <CAOUJ7Jq5NZjFSuueR_SxkNmRtRa+2ERvftT_o7=moSxzOfmh0A@mail.gmail.com>
I believe that if what the custom syntax for LD Patch is *near* what is
capable by a subset of SPARQL syntax, then I'd support/prefer the SPARQL
subset syntax approach.  There is nothing that says that you need a full
SPARQL Update 1.1 endpoint to process LD patch.  The subset + some
extensions if needed,  though those should wait until SPARQL Update adds
them IMO.  It would be ideal that it was strictly a subset, so IF you did
have a SPARQL endpoint, you could simply hand it off.

Design goal of LDP was to not require a full SPARQL endpoint, though to be
able to be implemented with one.  I think LD Patch has the same design
goal.  Using a subset of the SPARQL syntax, doesn't imply you need to have
a fully compliant SPARQL endpoint.  Just implies that there is a parser
that can consume the syntax and a processor to perform the patch/update,
this is the same model used whether parser is new syntax or SPARQL subset.

I agree with Sandro, the more competing syntaxes we have the harder it will
be for adoption.  I know at the last F2F we were leaning towards a simple
syntax, though the more I see it with more examples, it is harder for me to
see it as simpler by a factor worthy of defining something new.

At the end of the day, there could be both or many patch document formats.
 So I don't recommend trying to do both, it would be good to align on one
within this WG.  A good old fashion syntax smackdown we have.

Regards,
Steve Speicher
http://stevespeicher.me


On Sun, Jul 27, 2014 at 9:37 AM, Andrei Sambra <andrei.sambra@gmail.com>
wrote:

>
>
>
> On Sat, Jul 26, 2014 at 11:35 PM, Sandro Hawke <sandro@w3.org> wrote:
>
>> On 07/26/2014 10:20 PM, Alexandre Bertails wrote:
>>
>>> On Sat, Jul 26, 2014 at 5:59 PM, Sandro Hawke <sandro@w3.org> wrote:
>>>
>>>> On 07/26/2014 02:55 PM, Alexandre Bertails wrote:
>>>>
>>>>> On Sat, Jul 26, 2014 at 1:52 PM, Sandro Hawke <sandro@w3.org> wrote:
>>>>>
>>>>>> On 07/26/2014 01:44 PM, Ashok Malhotra wrote:
>>>>>>
>>>>>>> Hi Sandro:
>>>>>>> Thanks for the pointers.  I read some of the mail and the conclusion
>>>>>>> I
>>>>>>> came
>>>>>>> to seems a bit different from what you concluded.  I did not see a
>>>>>>> big
>>>>>>> push for
>>>>>>> SPARQL.  Instead I found from
>>>>>>> http://lists.w3.org/Archives/Public/public-rdf-shapes/
>>>>>>> 2014Jul/0206.html:
>>>>>>>
>>>>>>> "The other possibilities, no matter what the outcome of the workshop,
>>>>>>> *are*
>>>>>>> ready to be standardized and I rather suspect some work on combining
>>>>>>> the
>>>>>>> best elements of each will get us much further, must faster than
>>>>>>> trying
>>>>>>> to
>>>>>>> mature ShEx."
>>>>>>>
>>>>>>> So, this argues for leading with existing solutions, ICV and SPIN,
>>>>>>> rather
>>>>>>> than
>>>>>>> with ShEX because the other solution have some implementation and
>>>>>>> experience
>>>>>>> behind them.  Makes perfect sense.
>>>>>>>
>>>>>>> But the PATCH case seems to be different as AFAIK there are no other
>>>>>>> existing
>>>>>>> solutions.
>>>>>>>
>>>>>> We can always argue if they are suitable for the problem, but other
>>>>> existing/potential solutions include: SPARQL Update in full, 2 subsets
>>>>> of SPARQL Update, and RDF Patch + skolemization.
>>>>>
>>>>>  Isn't SPARQL UPDATE an existing solution for PATCH?
>>>>>>
>>>>>> It serves the basic purpose, although it has some drawbacks, like bad
>>>>>> worst-case performance and being fairly hard to implement.
>>>>>>
>>>>>> Those same things, however, could quite reasonably be said about ICV
>>>>>> and
>>>>>> SPIN.
>>>>>>
>>>>> I don't know about ICV, SPIN or ShEx (ok, just a little bit, maybe).
>>>>>
>>>>
>>>> To be clear, they are only relevant as another example of how inventing
>>>> something which could be done by SPARQL (even if painfully) gets a lot
>>>> of
>>>> pushback.
>>>>
>>> Have you considered that the pushback _could_ be justified?
>>>
>>> For example, I really like SPARQL, for several reasons, but as I have
>>> explained, I really think it is not appropriate as a PATCH format for
>>> LDP.
>>>
>>>
>>>>     I just have two remarks:
>>>>>
>>>>> * SPARQL Update as a whole was developed for RDF databases, namely
>>>>> quad stores, with expressive power from the rest of SPARQL. I don't
>>>>> know if it was designed with use-cases as in RDF Validation, but I do
>>>>> know it was not designed for the use-case of updating LDP-RS on the
>>>>> LDP platform.
>>>>> * building a technology on top of an existing one is something I tend
>>>>> to prefer whenever it makes sense. But in our case, we are talking
>>>>> about taking the subset of an existing language, while remaining
>>>>> compatible with it. This is *not* as easy as it seems at first.
>>>>>
>>>>> I would prefer to hear about concrete proposals on how to do that. As
>>>>> somebody who _cannot_ rely on an existing SPARQL implementations, and
>>>>> who is not planning to implement one in full for that use-case, I
>>>>> would like to see a concrete syntax written down, with a formal
>>>>> semantics for it.
>>>>>
>>>>
>>>> Okay, I'm going to make two concrete proposals.
>>>>
>>>> 1.  Just use SPARQL 1.1 Update.   The whole thing.   I know it doesn't
>>>> handle lists well.  What else is wrong with it?  Why can you not use it?
>>>>
>>> I became interested in LDP because it was the first time RDF was
>>> becoming a first-class citizen of the Web, by that I mean applications
>>> can interact (read/write) *directly* with RDF resources using HTTP,
>>> without being behind an endpoint. That's what we meant by LDP being
>>> the intersection of RDF and REST.
>>>
>>> The W3C has finally recognized a few years ago that native RDF was not
>>> the only use-case for RDF applications. You can now have a relational
>>> database (RDB2RDF), CSV files (RDF for Tabular Data), XML (GRDDL,
>>> XSLT), etc. But not necessarily a triple/quad store. For example, at
>>> the company I work for, we have several (ie. physically disconnected)
>>> Datomic and Cassandra servers, and we are now exposing some of the
>>> data behind LDP, with the objective of doing for all of our data. In
>>> all those cases, we want to expose and link our data on the Web, like
>>> all those so-called RESTful APIs, but in a more consistent way, and
>>> using RDF as the model and the exchange data format. Hence LDP, and
>>> not yet-another-web-api.
>>>
>>> The reason I am telling you all that is that supporting SPARQL for
>>> those virtual RDF datasets is not that easy (when possible) when you
>>> don't have a quadstore as your backend. Reverse mapping for simple
>>> SPARQL queries is hard. And SPARQL Update is even worse to support.
>>> Basically, forcing SPARQL Update onto LDP facing applications for
>>> simple resource updates on single LDP-RS (ie. PATCH) is like using a
>>> hammer to kill a fly.
>>>
>>> So full SPARQL Update is simply a no-go for me. I just cannot support
>>> it completely, as some features cannot correctly be mapped to Datomic
>>> and Cassandra.
>>>
>>
>> So this is the key.   You want to be able to support PATCH on databases
>> that are not materialized as either triples OR as SQL.
>>
>> If the database was SQL, then (as I understand it), SPARQL Update would
>> be okay, because it can be mapped to SQL.
>>
>> But you don't know how to map SPARQL Update to NoSQL databases, or it's
>> just too much work.
>>
>> I take it you do know how to map LD-Patch to Cassandra and Datomic?
>>
>> [ BTW, Datomic sounds awesome.  Is it as fun to use as I'd imagine? ]
>>
>>
>>
>>
>>> Also, if I was in a case where SPARQL Update was ok for me to use
>>> (it's not), then I suspect that I wouldn't need LDP at all, and SPARQL
>>> + Update + Graph Store protocol would just be enough. And there is
>>> nothing preventing one from using SPARQL Update right now. Just don't
>>> call it LD Patch.
>>>
>>
>> It's not about what's called what, it's about what we promote as the the
>> PATCH format.   If we had a simple enough PATCH format, then we could
>> possibly make it a MUST to implement in the next version of LDP.
>>
>
> I think Alexandre makes a valid point. For a spec (LDP) that explicitly
> tried to avoid SPARQL, using this format for PATCH makes absolutely no
> sense to me.
>
>
>>
>> I don't think SPARQL Update is simple enough for that, but my prediction
>> is the LD-Patch will turn out, sadly, to not be either.
>>
>>
>>
>>  2.  Use SPARQL 1.1 Update with an extension to handle lists well.
>>>> Specifically, it would be a slice function, usable in FILTER and
>>>> especially
>>>> in BIND.   This seems like a no-brainer to include in SPARQL 1.2.  I'd
>>>> want
>>>> to talk to a few of the SPARQL implementers and see if they're up for
>>>> adding
>>>> it.    Maybe a full set of list functions like [1].
>>>>
>>> Sorry but I don't know RIF and your idea is still very vague for me. I
>>> understand how you can provide new functions for matching nodes in an
>>> rdf:list but I fail to see how this plays in a SPARQL Update query.
>>>
>>> Can you just provide some examples where you are doing the equivalent
>>> of that python code (I know read python):
>>>
>>
>> Probably not worthwhile to go into this now, given your veto on SPARQL.
>>
>>
>>
>>  [[
>>>
>>>>  l = [1,2,3,4,5,6,7,8,9,10]
>>>>>> l[2:2] = [11,12]
>>>>>> l[2:7] = [13,14]
>>>>>> l[2:] = [15,16]
>>>>>> l.append(17)
>>>>>>
>>>>> ]]
>>>
>>>  If we want a subset, we could define it purely by restricting the
>>>> grammar --
>>>> eg leaving out the stuff that does federation, negation, aggregation, --
>>>> with no need to say anything about the semantics except they are the
>>>> same as
>>>> SPARQL.   Until I hear what the problem is with SPARQL, though, I don't
>>>> want
>>>> to start excluding stuff.
>>>>
>>> Am I the only one thinking that "no need to say anything about the
>>> semantics except they are the same as SPARQL" is just plain wrong?
>>>
>>> I mean, would we really tell implementers and users of the technology
>>> that they have to go learn SPARQL before they can start understanding
>>> what subset correctly apply to LD Patch? And how? And would they still
>>> need to carry this ResultSet semantics over while a lot of us would
>>> explicitly prefer avoiding it?
>>>
>>
>> I think the users who are writing PATCHes by hand will be familiar with
>> SPARQL.  And if they are not, there are lots of other reasons to learn it.
>>
>
> Except that LDP explicitly made a point to avoid SPARQL. Since the LDP
> model is all about interacting with resources by using their individual
> URIs, PATCH-ing resources through a SPARQL endpoint goes against the core
> LDP believes.
>
> -- Andrei
>
>
>>
>> Contrast that with LD-Patch, for which there is no other reason it.
>>
>> You seem to think LD-Patch's syntax and semantics are easy.   I don't
>> think they are.   Maybe if you expanded the path syntax only many rows it
>> would be more clear what it means.
>>
>> I can't help but regret again that we didn't chose to use TurtlePatch
>> (which I first wrote on your wall, the week after the workshop - even if I
>> didn't figure out how to handle bnodes until this year).
>> https://www.w3.org/2001/sw/wiki/TurtlePatch
>>
>>        -- Sandro
>>
>>
>>
>>> Alexandre
>>>
>>>         -- Sandro
>>>>
>>>>
>>>> [1] http://www.w3.org/TR/rif-dtb/#Functions_and_Predicates_on_RIF_Lists
>>>>
>>>>
>>>>
>>>>  Alexandre
>>>>>
>>>>>          -- Sandro
>>>>>>
>>>>>>
>>>>>>  All the best, Ashok
>>>>>>> On 7/26/2014 6:10 AM, Sandro Hawke wrote:
>>>>>>>
>>>>>>>> On July 25, 2014 2:48:28 PM EDT, Alexandre Bertails
>>>>>>>> <alexandre@bertails.org> wrote:
>>>>>>>>
>>>>>>>>> On Fri, Jul 25, 2014 at 11:51 AM, Ashok Malhotra
>>>>>>>>> <ashok.malhotra@oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>> Alexandre:
>>>>>>>>>> The W3C held a RDF Validation Workshop last year.
>>>>>>>>>> One of the questions that immediately came up was
>>>>>>>>>> "We can use SPARQL to validate RDF".  The answer was
>>>>>>>>>> that SPARQL was to complex and too hard to learn.
>>>>>>>>>> So, we compromised and the workshop recommended
>>>>>>>>>> that a new RDF validation language should be developed
>>>>>>>>>> to cover the simple cases and SPARQL could be used when
>>>>>>>>>> things got complex.
>>>>>>>>>>
>>>>>>>>>> It seems to me that you can make a similar argument
>>>>>>>>>> for RDF Patch.
>>>>>>>>>>
>>>>>>>>> I totally agree with that.
>>>>>>>>>
>>>>>>>>>  Thanks for bringing this up, Ashok.    I'm going to use the same
>>>>>>>> situation to argue the opposite.
>>>>>>>>
>>>>>>>> It's relatively easy for a group of people, especially at a face to
>>>>>>>> face
>>>>>>>> meeting, too come to the conclusion SPARQL is too hard to learn and
>>>>>>>> we
>>>>>>>> should invent something else.    But when we took it to the wider
>>>>>>>> world, we
>>>>>>>> got a reaction that's so strong it's hard not to characterize as
>>>>>>>> violent.
>>>>>>>>
>>>>>>>> You might want to read:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://lists.w3.org/Archives/Public/public-rdf-shapes/
>>>>>>>> 2014Jul/thread.html
>>>>>>>>
>>>>>>>> Probably the most recent ones right now give a decent summary and
>>>>>>>> you
>>>>>>>> don't have to read them all.
>>>>>>>>
>>>>>>>> I have lots of theories to explain the disparity.   Like: people who
>>>>>>>> have
>>>>>>>> freely chosen to join an expedition are naturally more inclined to
>>>>>>>> go
>>>>>>>> somewhere interesting.
>>>>>>>>
>>>>>>>> I'm not saying we can't invent something new, but be sure to
>>>>>>>> understand
>>>>>>>> the battle to get it standardized may be harder than just
>>>>>>>> implementing
>>>>>>>> SPARQL everywhere.
>>>>>>>>
>>>>>>>>         - Sandro
>>>>>>>>
>>>>>>>>  Alexandre
>>>>>>>>>
>>>>>>>>>  All the best, Ashok
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 7/25/2014 9:34 AM, Alexandre Bertails wrote:
>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 25, 2014 at 8:04 AM, John Arwe <johnarwe@us.ibm.com>
>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>  Another problem is the support for rdf:list. I have just finished
>>>>>>>>>>>>> writing down the semantics for UpdateList and based on that
>>>>>>>>>>>>> experience, I know this is something I want to rely on as a
>>>>>>>>>>>>> user,
>>>>>>>>>>>>> because it is so easy to get it wrong, so I want native support
>>>>>>>>>>>>>
>>>>>>>>>>>> for
>>>>>>>>>
>>>>>>>>>>  it. And I don't think it is possible to do something equivalent
>>>>>>>>>>>>> in
>>>>>>>>>>>>> SPARQL Update. That is a huge drawback as list manipulation
>>>>>>>>>>>>> (eg.
>>>>>>>>>>>>>
>>>>>>>>>>>> in
>>>>>>>>>
>>>>>>>>>>  JSON-LD, or Turtle) is an everyday task.
>>>>>>>>>>>>>
>>>>>>>>>>>> Is semantics for UpdateList  (that you wrote down) somewhere
>>>>>>>>>>>> that
>>>>>>>>>>>>
>>>>>>>>>>> WG
>>>>>>>>>
>>>>>>>>>>  members
>>>>>>>>>>>> can look at it, and satisfy themselves that they agree with your
>>>>>>>>>>>> conclusion?
>>>>>>>>>>>>
>>>>>>>>>>> You can find the semantics at [1]. Even if still written in Scala
>>>>>>>>>>>
>>>>>>>>>> for
>>>>>>>>>
>>>>>>>>>> now, this is written in a (purely functional) style, which is very
>>>>>>>>>>> close to the formalism that will be used for the operational
>>>>>>>>>>>
>>>>>>>>>> semantics
>>>>>>>>>
>>>>>>>>>> in the spec. Also, note that this is the most complex part of the
>>>>>>>>>>> entire semantics, all the rest being pretty simple, even Paths.
>>>>>>>>>>> And
>>>>>>>>>>>
>>>>>>>>>> I
>>>>>>>>>
>>>>>>>>>> spent a lot of time finding the general solution while breaking it
>>>>>>>>>>>
>>>>>>>>>> in
>>>>>>>>>
>>>>>>>>>> simpler sub-parts.
>>>>>>>>>>>
>>>>>>>>>>> In a nutshell, you have 3 steps: first you move to the left
>>>>>>>>>>> bound,
>>>>>>>>>>> then you gather triples to delete until the right bound, and you
>>>>>>>>>>> finally insert the new triples in the middle. It's really tricky
>>>>>>>>>>> because 1. you want to minimize the number of operations, even if
>>>>>>>>>>>
>>>>>>>>>> this
>>>>>>>>>
>>>>>>>>>> is only a spec 2. unlike usual linked lists with pointers, you
>>>>>>>>>>> manipulate triples, so the pointer in question is only the node
>>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> object position in the triple, and you need to remember and carry
>>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> corresponding subject-predicate 3. interesting (ie. weird) things
>>>>>>>>>>>
>>>>>>>>>> can
>>>>>>>>>
>>>>>>>>>> happen at the limits of the list if you don't pay attention.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>  https://github.com/betehess/banana-rdf/blob/ldpatch/patch/
>>>>>>>>> src/main/scala/Semantics.scala#L62
>>>>>>>>>
>>>>>>>>>>  I'm not steeped enough in the intracacies of SPARQL Update to
>>>>>>>>>>>> have
>>>>>>>>>>>>
>>>>>>>>>>> a
>>>>>>>>>
>>>>>>>>>>  horse
>>>>>>>>>>>> in this race, but if this issue is the big-animal difference
>>>>>>>>>>>> then
>>>>>>>>>>>>
>>>>>>>>>>> people
>>>>>>>>>
>>>>>>>>>>  with the necessary understanding are going to want to see the
>>>>>>>>>>>>
>>>>>>>>>>> details.
>>>>>>>>>
>>>>>>>>>>  The
>>>>>>>>>>>> IBM products I'm aware of eschew rdf:List (and blank nodes
>>>>>>>>>>>>
>>>>>>>>>>> generally, to
>>>>>>>>>
>>>>>>>>>>  first order), so I don't know how much this one alone would sway
>>>>>>>>>>>>
>>>>>>>>>>> me.
>>>>>>>>>
>>>>>>>>>> You _could_ generate a SPARQL Update query that would do something
>>>>>>>>>>> equivalent. But you'd have to match and remember the intermediate
>>>>>>>>>>> nodes/triples.
>>>>>>>>>>>
>>>>>>>>>>> JSON-LD users manipulate lists on a day-to-day basis. Without
>>>>>>>>>>> native
>>>>>>>>>>> support for rdf:list in LD Patch, I would turn to JSON PATCH to
>>>>>>>>>>> manipulate those lists.
>>>>>>>>>>>
>>>>>>>>>>>  It sounds like the other big-animal difference in your email is
>>>>>>>>>>>>
>>>>>>>>>>>>  we would have to refine the SPARQL semantics so that the order
>>>>>>>>>>>>> of
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>  clauses matters (ie. no need to depend on a query optimiser). And
>>>>>>>>>>>>>
>>>>>>>>>>>> we
>>>>>>>>>
>>>>>>>>>>  That sounds like a more general problem.  It might mean, in
>>>>>>>>>>>> effect,
>>>>>>>>>>>>
>>>>>>>>>>> that
>>>>>>>>>
>>>>>>>>>>  no
>>>>>>>>>>>> one would be able to use existing off-the-shelf componentry
>>>>>>>>>>>> (specs
>>>>>>>>>>>>
>>>>>>>>>>> & code
>>>>>>>>>
>>>>>>>>>>  ... is that the implication, Those Who Know S-U?) and that might
>>>>>>>>>>>>
>>>>>>>>>>> well be
>>>>>>>>>
>>>>>>>>>>  a
>>>>>>>>>>>> solid answer to "why not [use S-U]?"
>>>>>>>>>>>>
>>>>>>>>>>> The fact that reordering the clauses doesn't change the
>>>>>>>>>>> semantics is
>>>>>>>>>>>
>>>>>>>>>> a
>>>>>>>>>
>>>>>>>>>> feature of SPARQL. It means that queries can be rearranged for
>>>>>>>>>>> optimisation purposes. But you never know if the execution plan
>>>>>>>>>>> will
>>>>>>>>>>> be the best one, and you can end up with huge intermediate result
>>>>>>>>>>> sets.
>>>>>>>>>>>
>>>>>>>>>>> In any case, if we ever go down the SPARQL Update way, I will ask
>>>>>>>>>>>
>>>>>>>>>> that
>>>>>>>>>
>>>>>>>>>> we specify that clauses are executed in order, or something like
>>>>>>>>>>>
>>>>>>>>>> that.
>>>>>>>>>
>>>>>>>>>> And I will ask for a semantics that doesn't rely on result sets if
>>>>>>>>>>> possible.
>>>>>>>>>>>
>>>>>>>>>>>  Were there any other big-animal issues you found, those two
>>>>>>>>>>>> aside?
>>>>>>>>>>>>
>>>>>>>>>>> A big issue for me will be to correctly explain the subset of
>>>>>>>>>>> SPARQL
>>>>>>>>>>> we would be considering, and its limitations compared to its big
>>>>>>>>>>> brother.
>>>>>>>>>>>
>>>>>>>>>>> Also, if you don't implement it from scratch and want to rely on
>>>>>>>>>>> an
>>>>>>>>>>> existing implementation, you would still have to reject all the
>>>>>>>>>>> correct SPARQL queries, and that can be tricky too, because you
>>>>>>>>>>> have
>>>>>>>>>>> to inspect the query after it is parsed. Oh, and I will make sure
>>>>>>>>>>> there are tests rejecting such queries :-)
>>>>>>>>>>>
>>>>>>>>>>> Alexandre
>>>>>>>>>>>
>>>>>>>>>>>  Best Regards, John
>>>>>>>>>>>>
>>>>>>>>>>>> Voice US 845-435-9470  BluePages
>>>>>>>>>>>> Cloud and Smarter Infrastructure OSLC Lead
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>
>>
>
Received on Sunday, 27 July 2014 14:30:36 UTC