Re: semsheets from Christian Chiarcos on 2022-02-25 (semantic-web@w3.org from February 2022)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Fri, 25 Feb 2022 14:04:53 +0100
To: Hans-Jürgen Rennau <hjrennau@gmail.com>
Cc: Paul Tyson <phtyson@sbcglobal.net>, semantic-web@w3.org
Message-ID: <CAC1YGdg467WyRE_d-Ozjy2EiaJc4=8Dr08ciCmQ7O-_z453WaQ@mail.gmail.com>
>
> Speaking of ways of thinking, integration means among other things that a
> graceful transition between tree and graph representation is a natural
> thing for us, almost as natural as an arithmetic operation, or the
> validation of a document against a schema, or conducting a query. If there
> is an abundance of tools, this is alarming enough; even more alarming is
> the common view point that people may have to write custom code. For tasks
> of a fundamental nature we should have fundamental answers, which means
> declarative tools - *if* this is possible. With declarative I mean:
> allowing the user to describe the desired result and not care about how to
> achieve it.
>

Well, that's not quite true, you need to at least partially describe three
parts, the desired result, the expected input and the relation between
them. It seems you want something more than declarative in the traditional
sense, but you want a model that is not procedural, i.e., doesn't require
the handling of any internal states, but a plain mapping.

XSLT <3.0 falls under this definition as long as you don't do recursion
over named templates (because it didn't let you update the values of
variables) -- and for your use case you wouldn't need that.
Likewise, SPARQL CONSTRUCT is (not updates, because then you could
iterate), and I guess the same holds for most query languages.
However: It is generally considered a strength that both languages are
capable of doing iterations, and if these are in fact required by a use
case, a non-procedural formalization of the mapping would just not be
applicable anymore.

>
> If well done and not incurring a significant loss of flexibility, the gain
> of efficiency and the increase of reliability is obvious. (Imagine people
> driving self-made cars.)
>

There are query languages that claim to be Turing-complete [e.g.,
https://info.tigergraph.com/gsql], and in the general sense of separating
computation from control flow (which is the entire point of a query
language), they are declarative, but as they provide unlimited capabilities
for iteration or recursion, they would not be under your definition. The
lesson here is that that there is a level of irreducible complexity as soon
as you need to iterate and/or update internal states. If you feel that
these are not necessary, you can *make* any query-based mapping language
compliant to your criteria if you just eliminate the parts of the language
that deal with iteration and the update (not the binding) of variables.
That basically requires you to write your own validator to filter out
functionalities that you don't want to support, nothing more, as the rest
is handled by off-the-shelf technology. And for some languages (say,
SPARQL), already the choice of query operator (CONSTRUCT / SELECT) does
that for you.

So, a partial answer to your question seems to be: *Any query language*
(minus a few of its functionalities) would do. Beyond that, selection
criteria are no longer a matter of functionality but of verbosity and entry
bias, and the choice is up to the kind of source data you have. Overall,
there seem to be basically three types of transformations:

(a) source to RDF (specific to the source format[s], there *cannot* be a
one-stop solution because the sources are heterogeneous, we can only try to
aggregate -- GRDDL+XSL, R2RML, RMI, JSON-LD contexts, YARRRML are of that
kind, and except for being physically integrated with the source data, RDFa
is, too)
(b) source to SPARQL variable bindings (+ SPARQL CONSTRUCT, as in TARQL;
this is a one-stop solution in the sense that you can apply one language to
*all* input formats, however, only those formats supported by your tool;
the difference to the first group is that the mapping language itself is
SPARQL, so it is probably more easily applicable for an occasional user of
SW/LD technology than any special-purpose or source-specific formalism)
(c) source to a raw RDF representation + SPARQL CONSTRUCT (this is an
extension of (a) and the idea of some of the *software* solutions
suggested, but parts of the mapping effort are shifted from the
source-specific converters into the query (as in (b); this could also be
more portable than (a) as the format-/domain-specific can be covered by
generic converters/mapping)

A fourth type, (d) source to raw RDF + SPARQL Update would fall out of your
classification -- but all of them would normally be considered declarative.

Best,
Christian
Received on Friday, 25 February 2022 13:05:20 UTC