SPARQL subset as a PATCH format for LDP from Alexandre Bertails on 2014-07-24 (public-ldp-wg@w3.org from July 2014)

From: Alexandre Bertails <alexandre@bertails.org>
Date: Thu, 24 Jul 2014 14:30:42 -0400
To: "public-ldp-wg@w3.org" <public-ldp-wg@w3.org>
Message-ID: <CANvn8kxnJTmMFtFrhQfJZTBh3BN+80EiOdqvh+pHRsqP69XPfg@mail.gmail.com>
All,

I have been thinking a lot about the SPARQL subset idea and I would
like to share some thoughts. As you could have expected from the last
call, I am not in favor of it, so I have taken the time to document my
issues with the approach.

First, let me remind you the scope of LD Patch. It is PATCH format for
partial updates of LDP-RS. So it's only about RDF graphs. It is not
intended for updating quad stores, nor named graphs. Also, it is not
meant to be a high-level language but rather an assembly one. For that
reason, the editors challenged themselves for not adding higher-level
features.

Skolemization is not used. The assumption is that bnodes form tree
structures. The idea is that most of those trees (and the bnodes in
them) can be distinguished by filtering on sub-components of those
trees. I recommend [1] for a recent and thorough analysis confirming
those assumptions.

That is the very reason behind the LD Path (no 'c') algebra, which
shares some similarities with XPath. They are applied left-to-right,
and recursively for path constraints. The semantics formally specifies
the order in which those operations must be evaluated. So LDP
application writers can rely on that semantics for runtime
characteristics, for example by restraining the node sets as early as
possible in the path, by probably starting from the leaves of the
tree, and then moving up in the tree, until reaching the bnode.

So, SPARQL. Yes, you can consider a subset with similar expressive
power. People seem to think that defining the concrete syntax would be
enough, and that it would be as easy if not easier than LD Patch. I
disagree. First, the two concrete syntaxes would share a lot of the
production rules, basically all the ones borrowed from Turtle. The
additional ones are no issue in both cases.

Then, I have heard people saying that we wouldn't need to write down
the operational semantics, because we could say it's the same than
SPARQL Update, but for that subset of the syntax. I disagree. Because
as a developer and as a user, I would have to be sure I understand
well the SPARQL semantics to either implement LD Patch (if I don't
want to depend on an existing SPARQL implementation), or to use it. So
I'd argue that the semantics _has_ to be written. And I'd have to
reject valid SPARQL Update queries which are not in the subset.

Another issue is that we will still need Basic Graph Patterns, the (S
P O .)-s in the WHERE clause, which rely on intermediate ResultSet-s
for their semantics.

For example:

Bind ?event <http://conferences.ted.com/TED2009/>
/-schema:url[/schema:startDate="2009-02-04"]/schema:location[/schema:name="Long
Beach, California"][/schema:geo[/schema:latitude][/schema:longitude]]

would be equivalent to something like that:

WHERE {
  ?event schema:url <http://conferences.ted.com/TED2009/> .
  ?event schema:startDate "2009-02-04" .
  ?event schema:location ?loc .
  ?loc schema:name "Long Beach, California" .
  ?loc schema:geo ?geo .
  ?geo schema:latitude [] .
  ?geo schema:longitude [] .
}

If we want the same performance characterics (mainly, predictability),
we would have to refine the SPARQL semantics so that the order of the
clauses matters (ie. no need to depend on a query optimiser). And we
would need to do some static analysis on the query to make sure that
ResultSet-s are not needed. In any case, it goes beyond the idea of
using subset of the syntax + a pointer to SPARQL Update semantics.

Another problem is the support for rdf:list. I have just finished
writing down the semantics for UpdateList and based on that
experience, I know this is something I want to rely on as a user,
because it is so easy to get it wrong, so I want native support for
it. And I don't think it is possible to do something equivalent in
SPARQL Update. That is a huge drawback as list manipulation (eg. in
JSON-LD, or Turtle) is an everyday task.

So to summarize my issues with the approach:

1. semantics is not that easy to define
2. performance characteristics
3. no native support for rdf:list
4. needs to explain to the user how it differs from existing SPARQL
Update

SPARQL Update is good at doing what it was designed for, but there is
little interest in being syntax compatible with it.

Regards,

Alexandre

[1] http://www.websemanticsjournal.org/index.php/ps/article/view/365
Received on Thursday, 24 July 2014 18:31:09 UTC