- From: Martynas Jusevičius <martynas@atomgraph.com>
- Date: Sat, 26 Feb 2022 07:39:39 +0100
- To: Hans-Jürgen Rennau <hjrennau@gmail.com>
- Cc: Christian Chiarcos <christian.chiarcos@web.de>, Paul Tyson <phtyson@sbcglobal.net>, semantic-web@w3.org
- Message-ID: <CAE35Vmx8Sev=5_B9e4hGN6D8iVPdQqU38DS4zcL26LbiXtHntQ@mail.gmail.com>
On Sat, 26 Feb 2022 at 01.43, Hans-Jürgen Rennau <hjrennau@gmail.com> wrote: > Absolutely agree - to me, that's the idea behind rml.io. > > Your concern about optimization is a puzzle to me. A declarative mapping > language à la rml is not optimized for anything, except for clean and clear > statement of the intended result - the what, not the how. That's the art of > it. Optimization comes later, and the clearer our thought, well-structured > and intuitive its capturing - the larger the scope for optimization behind > the scenes. And the more sustainable and enduring the result of time spent, > because a mapping defined today, investing, say, eight hours may be slowly > processed today, faster in a month, and much faster in a year, without me > spending another minute on this. So a key benefit is potential return on > investment. And I mention in passing - how much cheaper is the > *maintenance* of 100 simple artifacts (say, yarrrml documents) not > bothering with optimization and backed by a sophisticated implementation - > compared with 100 artifacts twice or ten times as complex in their quest > for speed. > > Did I really understand you correctly - in spite of its amazing > generality, the rml approach is not promising, because not concerned with > optimization? > I have no use for RML, because applying data model-specific transformations (instead of one general language) has not been a problem, in my experience. SPARQL and XSLT implementations will be more explicit, very likely more performant if done right, and, as I wrote before, by far better standardized and with wider suppot. With kind regards - Hans-Jürgen > > Am Fr., 25. Feb. 2022 um 21:58 Uhr schrieb Martynas Jusevičius < > martynas@atomgraph.com>: > >> >> >> On Fri, 25 Feb 2022 at 21.49, Hans-Jürgen Rennau <hjrennau@gmail.com> >> wrote: >> >>> Thank you, Christian, that's helpful! >>> >>> We must take care not to stumble over differences of terminology. When I >>> say "formats", I mean syntax types: XML, JSON, HTML, CSV, TSV, ... So there >>> are not two XML formats, but XML is a format, JSON is a format, etc. Not >>> important here if this is a fortunate choice, as long as we avoid a >>> misunderstanding. >>> >>> What you call "formats" is something different; something which I use to >>> call "document types". A vocabulary or, in a narrower sense, an explicit or >>> implicit model of structure, names and meanings. For example, two different >>> web service messages, say FooRequest and BarResponse, are two document >>> types. They certainly require two different mappings. We need not waste >>> time on discussion about the fact that every document type requires its own >>> custom mapping to RDF. It is obvious, and therefore I would not speak about >>> the absence of a one stop. We have to map non-RDF documents to RDF again >>> and again, again and again having to deal with different document types. >>> The permanent need to speak (to map) is the very reason one may ask for a >>> language (a dedicated mapping language). >>> >>> To summarize my position: the necessity to perform custom-mapping has >>> nothing to do with the state of technology, but with the nature of things. >>> It is from this perspective that the goal of a uniform mapping language, >>> applicable to all or many formats (syntax types) becomes interesting. As we >>> appreciate the benefits of a uniform styling language (CSS), a uniform >>> document transformation language (XSLT and XQuery), a uniform modeling >>> language (UML), a uniform locator model (URI), etc. - we might also >>> appreciate a uniform to-RDF mapping language. (Mind you - uniform does not >>> mean that it's always the best choice, but often.) >>> >>> Concerning the raw RDF: perhaps an appropriate approach in some >>> scenarios, but with little interest from the point of view of a unified >>> mapping language, as one is thrown back on a generic transformation task. >>> Which is exactly the baseline from where to depart. >>> >> >> I think that is the idea behind https://rml.io/. >> >> Another misunderstanding concerns the term "declarative", I'll return to >>> that later. >>> >>> Kind regards, Hans-Jürgen >>> >>> PS: I wonder which crosses you would make: >>> A uniform to-RDF mapping language? >>> o Too unclear what it means >>> o Not feasible >>> o Not useful >>> o Pointless because: ______ >>> >> >> I would cross all of the above. You can’t have a mapping language that is >> equally optimized for both tabular and tree data. For example, streaming >> transformation of CSV is trivial, but streaming transformation of XML is >> complex. >> You might succeed in creating a general mapping, but it would be very >> shallow and un-optimizable. >> >> Am Fr., 25. Feb. 2022 um 17:24 Uhr schrieb Christian Chiarcos < >>> christian.chiarcos@web.de>: >>> >>>> Am Fr., 25. Feb. 2022 um 15:44 Uhr schrieb Hans-Jürgen Rennau < >>>> hjrennau@gmail.com>: >>>> >>>>> Thank you, Christian. Before responding, I have a couple of questions. >>>>> You write: >>>>> >>>>> " (a) source to RDF (specific to the source format[s], there *cannot* >>>>> be a one-stop solution because the sources are heterogeneous, we can only >>>>> try to aggregate -- GRDDL+XSL, R2RML, RMI, JSON-LD contexts, YARRRML are of >>>>> that kind, and except for being physically integrated with the source data, >>>>> RDFa is, too)" >>>>> >>>>> I do not understand - YARRRML *is* a one-stop solution for all source >>>>> formats included in an extensible list of formats, already now including: >>>>> RDB, CSV, JSON, XML. So in principle it is a comprehensive solution for a >>>>> given heterogeneous set of data sources. Could you explain what you mean, >>>>> saying "there cannot be a one-stop solution"? >>>>> >>>> >>>> It is, in fact, an example of aggregation. So, even if providing a >>>> common level of abstraction for different formats, the underlying machinery >>>> has to be specific to these source formats. Coverage for XML is great, of >>>> course, but "support for XML" doesn't necessarily mean that all XML formats >>>> are covered. A generic XML converter is not very helpful if your data >>>> requires to keep track of dependencies between multiple XML files, for >>>> example. You can convert/extract from DOCX documents with generic XML >>>> technology (+ZIP), but it's a nightmare that -- if done right -- requires >>>> you to understand hundreds of pages of documentation (including, but not >>>> limited to >>>> https://interoperability.blob.core.windows.net/files/MS-DOCX/%5bMS-DOCX%5d.pdf). >>>> As long as people keep on inventing formats, and as long as these formats >>>> keep on evolving, any aggregation-based solution will be incomplete. Hence >>>> not a one-stop-solution -- unless *all* format developers *everywhere* >>>> decide to work on the same aggregator platform and stop developing ad-hoc >>>> converters for ad-hoc formats (spoiler: despite serious efforts and *some >>>> progress*, this is not what happened in the past 50 years: In academia and >>>> among developers, we see fragmentation and conventions widely used within >>>> their own communities but not so much beyond them -- as in the SW; and in >>>> the industry, limited interoperability is actively used as tool against >>>> competitors). >>>> >>>> >>>>> And a second question: what does "raw RDF representation" mean? I >>>>> suppose the result of a purely mechanical translation, derived from item >>>>> names, but I am not sure. >>>>> >>>> >>>> A raw RDF representation of, say, XML in RDF can just encode the XML >>>> data structure in RDF, i.e., create an my:Element for every element, >>>> my:name for its name, an my:Attribute for every attribute, my:child for >>>> every child, my:next for every sibling, etc. (my: is an invented namespace, >>>> replace by whatever prefix you want for XML data structures.) That's >>>> trivial and can be done with a few dozen lines in XSL (e.g., >>>> https://github.com/acoli-repo/LLODifier/blob/master/tei/tei2llod.xsl). >>>> And it will convert any XML document into RDF. But this is not a meaningful >>>> representation and it's too verbose to be practical, because you easily >>>> create thousands of triples for pieces of information that can be expressed >>>> in just a few, as you encode the complete structure of the XML file. In >>>> order to get the semantics out of the jungle of XML data you need to filter >>>> and aggregate, and that can be (more or less) effectively done with SPARQL >>>> (for example). >>>> >>>> Best, >>>> Christian >>>> >>>>> >>>>> Thank you in advance, kind regards - Hans-Jürgen >>>>> >>>>> Am Fr., 25. Feb. 2022 um 14:05 Uhr schrieb Christian Chiarcos < >>>>> christian.chiarcos@web.de>: >>>>> >>>>>> Speaking of ways of thinking, integration means among other things >>>>>>> that a graceful transition between tree and graph representation is a >>>>>>> natural thing for us, almost as natural as an arithmetic operation, or the >>>>>>> validation of a document against a schema, or conducting a query. If there >>>>>>> is an abundance of tools, this is alarming enough; even more alarming is >>>>>>> the common view point that people may have to write custom code. For tasks >>>>>>> of a fundamental nature we should have fundamental answers, which means >>>>>>> declarative tools - *if* this is possible. With declarative I mean: >>>>>>> allowing the user to describe the desired result and not care about how to >>>>>>> achieve it. >>>>>>> >>>>>> >>>>>> Well, that's not quite true, you need to at least partially describe >>>>>> three parts, the desired result, the expected input and the relation >>>>>> between them. It seems you want something more than declarative in the >>>>>> traditional sense, but you want a model that is not procedural, i.e., >>>>>> doesn't require the handling of any internal states, but a plain mapping. >>>>>> >>>>>> XSLT <3.0 falls under this definition as long as you don't do >>>>>> recursion over named templates (because it didn't let you update the values >>>>>> of variables) -- and for your use case you wouldn't need that. >>>>>> Likewise, SPARQL CONSTRUCT is (not updates, because then you could >>>>>> iterate), and I guess the same holds for most query languages. >>>>>> However: It is generally considered a strength that both languages >>>>>> are capable of doing iterations, and if these are in fact required by a use >>>>>> case, a non-procedural formalization of the mapping would just not be >>>>>> applicable anymore. >>>>>> >>>>>>> >>>>>>> If well done and not incurring a significant loss of flexibility, >>>>>>> the gain of efficiency and the increase of reliability is obvious. (Imagine >>>>>>> people driving self-made cars.) >>>>>>> >>>>>> >>>>>> There are query languages that claim to be Turing-complete [e.g., >>>>>> https://info.tigergraph.com/gsql], and in the general sense of >>>>>> separating computation from control flow (which is the entire point of a >>>>>> query language), they are declarative, but as they provide unlimited >>>>>> capabilities for iteration or recursion, they would not be under your >>>>>> definition. The lesson here is that that there is a level of irreducible >>>>>> complexity as soon as you need to iterate and/or update internal states. If >>>>>> you feel that these are not necessary, you can *make* any query-based >>>>>> mapping language compliant to your criteria if you just eliminate the parts >>>>>> of the language that deal with iteration and the update (not the binding) >>>>>> of variables. That basically requires you to write your own validator to >>>>>> filter out functionalities that you don't want to support, nothing more, as >>>>>> the rest is handled by off-the-shelf technology. And for some languages >>>>>> (say, SPARQL), already the choice of query operator (CONSTRUCT / SELECT) >>>>>> does that for you. >>>>>> >>>>>> So, a partial answer to your question seems to be: *Any query >>>>>> language* (minus a few of its functionalities) would do. Beyond that, >>>>>> selection criteria are no longer a matter of functionality but of verbosity >>>>>> and entry bias, and the choice is up to the kind of source data you have. >>>>>> Overall, there seem to be basically three types of transformations: >>>>>> >>>>>> (a) source to RDF (specific to the source format[s], there *cannot* >>>>>> be a one-stop solution because the sources are heterogeneous, we can only >>>>>> try to aggregate -- GRDDL+XSL, R2RML, RMI, JSON-LD contexts, YARRRML are of >>>>>> that kind, and except for being physically integrated with the source data, >>>>>> RDFa is, too) >>>>>> (b) source to SPARQL variable bindings (+ SPARQL CONSTRUCT, as in >>>>>> TARQL; this is a one-stop solution in the sense that you can apply one >>>>>> language to *all* input formats, however, only those formats supported by >>>>>> your tool; the difference to the first group is that the mapping language >>>>>> itself is SPARQL, so it is probably more easily applicable for an >>>>>> occasional user of SW/LD technology than any special-purpose or >>>>>> source-specific formalism) >>>>>> (c) source to a raw RDF representation + SPARQL CONSTRUCT (this is an >>>>>> extension of (a) and the idea of some of the *software* solutions >>>>>> suggested, but parts of the mapping effort are shifted from the >>>>>> source-specific converters into the query (as in (b); this could also be >>>>>> more portable than (a) as the format-/domain-specific can be covered by >>>>>> generic converters/mapping) >>>>>> >>>>>> A fourth type, (d) source to raw RDF + SPARQL Update would fall out >>>>>> of your classification -- but all of them would normally be considered >>>>>> declarative. >>>>>> >>>>>> Best, >>>>>> Christian >>>>>> >>>>>
Received on Saturday, 26 February 2022 06:40:07 UTC