Re: semsheets from Hans-Jürgen Rennau on 2022-02-25 (semantic-web@w3.org from February 2022)

From: Hans-Jürgen Rennau <hjrennau@gmail.com>
Date: Fri, 25 Feb 2022 15:44:16 +0100
To: Christian Chiarcos <christian.chiarcos@web.de>
Cc: Paul Tyson <phtyson@sbcglobal.net>, semantic-web@w3.org
Message-ID: <CA+H2zTAMNSAyL-Hkm52=PP0rchvGNXWE31piO3Bf6-Xpx2ZFeQ@mail.gmail.com>
Thank you, Christian. Before responding, I have a couple of questions. You
write:

" (a) source to RDF (specific to the source format[s], there *cannot* be a
one-stop solution because the sources are heterogeneous, we can only try to
aggregate -- GRDDL+XSL, R2RML, RMI, JSON-LD contexts, YARRRML are of that
kind, and except for being physically integrated with the source data, RDFa
is, too)"

I do not understand - YARRRML *is* a one-stop solution for all source
formats included in an extensible list of formats, already now including:
RDB, CSV, JSON, XML. So in principle it is a comprehensive solution for a
given heterogeneous set of data sources. Could you explain what you mean,
saying "there cannot be a one-stop solution"?

And a second question: what does "raw RDF representation" mean? I suppose
the result of a purely mechanical translation, derived from item names, but
I am not sure.

Thank you in advance, kind regards - Hans-Jürgen

Am Fr., 25. Feb. 2022 um 14:05 Uhr schrieb Christian Chiarcos <
christian.chiarcos@web.de>:

> Speaking of ways of thinking, integration means among other things that a
>> graceful transition between tree and graph representation is a natural
>> thing for us, almost as natural as an arithmetic operation, or the
>> validation of a document against a schema, or conducting a query. If there
>> is an abundance of tools, this is alarming enough; even more alarming is
>> the common view point that people may have to write custom code. For tasks
>> of a fundamental nature we should have fundamental answers, which means
>> declarative tools - *if* this is possible. With declarative I mean:
>> allowing the user to describe the desired result and not care about how to
>> achieve it.
>>
>
> Well, that's not quite true, you need to at least partially describe three
> parts, the desired result, the expected input and the relation between
> them. It seems you want something more than declarative in the traditional
> sense, but you want a model that is not procedural, i.e., doesn't require
> the handling of any internal states, but a plain mapping.
>
> XSLT <3.0 falls under this definition as long as you don't do recursion
> over named templates (because it didn't let you update the values of
> variables) -- and for your use case you wouldn't need that.
> Likewise, SPARQL CONSTRUCT is (not updates, because then you could
> iterate), and I guess the same holds for most query languages.
> However: It is generally considered a strength that both languages are
> capable of doing iterations, and if these are in fact required by a use
> case, a non-procedural formalization of the mapping would just not be
> applicable anymore.
>
>>
>> If well done and not incurring a significant loss of flexibility, the
>> gain of efficiency and the increase of reliability is obvious. (Imagine
>> people driving self-made cars.)
>>
>
> There are query languages that claim to be Turing-complete [e.g.,
> https://info.tigergraph.com/gsql], and in the general sense of separating
> computation from control flow (which is the entire point of a query
> language), they are declarative, but as they provide unlimited capabilities
> for iteration or recursion, they would not be under your definition. The
> lesson here is that that there is a level of irreducible complexity as soon
> as you need to iterate and/or update internal states. If you feel that
> these are not necessary, you can *make* any query-based mapping language
> compliant to your criteria if you just eliminate the parts of the language
> that deal with iteration and the update (not the binding) of variables.
> That basically requires you to write your own validator to filter out
> functionalities that you don't want to support, nothing more, as the rest
> is handled by off-the-shelf technology. And for some languages (say,
> SPARQL), already the choice of query operator (CONSTRUCT / SELECT) does
> that for you.
>
> So, a partial answer to your question seems to be: *Any query language*
> (minus a few of its functionalities) would do. Beyond that, selection
> criteria are no longer a matter of functionality but of verbosity and entry
> bias, and the choice is up to the kind of source data you have. Overall,
> there seem to be basically three types of transformations:
>
> (a) source to RDF (specific to the source format[s], there *cannot* be a
> one-stop solution because the sources are heterogeneous, we can only try to
> aggregate -- GRDDL+XSL, R2RML, RMI, JSON-LD contexts, YARRRML are of that
> kind, and except for being physically integrated with the source data, RDFa
> is, too)
> (b) source to SPARQL variable bindings (+ SPARQL CONSTRUCT, as in TARQL;
> this is a one-stop solution in the sense that you can apply one language to
> *all* input formats, however, only those formats supported by your tool;
> the difference to the first group is that the mapping language itself is
> SPARQL, so it is probably more easily applicable for an occasional user of
> SW/LD technology than any special-purpose or source-specific formalism)
> (c) source to a raw RDF representation + SPARQL CONSTRUCT (this is an
> extension of (a) and the idea of some of the *software* solutions
> suggested, but parts of the mapping effort are shifted from the
> source-specific converters into the query (as in (b); this could also be
> more portable than (a) as the format-/domain-specific can be covered by
> generic converters/mapping)
>
> A fourth type, (d) source to raw RDF + SPARQL Update would fall out of
> your classification -- but all of them would normally be considered
> declarative.
>
> Best,
> Christian
>
Received on Friday, 25 February 2022 14:44:40 UTC