Re: semsheets from Hans-Jürgen Rennau on 2022-02-25 (semantic-web@w3.org from February 2022)

From: Hans-Jürgen Rennau <hjrennau@gmail.com>
Date: Fri, 25 Feb 2022 08:41:06 +0100
To: Paul Tyson <phtyson@sbcglobal.net>
Cc: semantic-web@w3.org
Message-ID: <CA+H2zTCaSKyoh-pbsgRR8_nndqBoBubn8+HSchUYuCjqr1tSDg@mail.gmail.com>
Paul,

you wonder whether my interest is academic or practical, and you suppose it
must be academic, since there is no shortage of tools. To respond correctly
is easy, meaningfully, however, more difficult. My daily work is
development and I am not interested in academic points of view. At the same
time, I feel a passionate urge to think and to understand. You point to the
abundance of tools and wonder for what to ask, then, unless one is an
academic. From my point of view, the task has key importance and a very
simple nature. To speak in an image, I conclude we should build a bridge,
in spite of an abundance of ferries.

Why key importance? I believe that the reconciliation of tree thinking and
graph thinking is the grail of data modeling. Both perspectives have unique
strength and are profoundly complementary. To my disappointment, I have not
yet met a person who has a desire to understand both perspectives deeply.
(I am an expert of trees, and I long for cooperation with graph experts as
keen on learning about trees as I am keen on learning about graphs.) But be
this as it may, the important question here is whether such an integration
is really possible and what it would mean.

Speaking of ways of thinking, integration means among other things that a
graceful transition between tree and graph representation is a natural
thing for us, almost as natural as an arithmetic operation, or the
validation of a document against a schema, or conducting a query. If there
is an abundance of tools, this is alarming enough; even more alarming is
the common view point that people may have to write custom code. For tasks
of a fundamental nature we should have fundamental answers, which means
declarative tools - *if* this is possible. With declarative I mean:
allowing the user to describe the desired result and not care about how to
achieve it. If well done and not incurring a significant loss of
flexibility, the gain of efficiency and the increase of reliability is
obvious. (Imagine people driving self-made cars.)

Let me give a practical example. The approach of R2RML / RMI / YARRRML is
obviously declarative. The way you have to say what you want to achieve
presupposes a certain structuring of your intention which appears to me
natural, intuitive and close to the utmost simplicity possible, given the
irreducible complexity of information required to just define the
intention. (In other words: telling what you want is hardly less verbose
and not at all more readable than what you create in order to get it, using
this declarative language.) I am convinced that here a good path has been
taken, although perhaps we may go even further and more fully unfold the
potential already tapped.

If you assume for a moment this perspective of mine, you may find it
natural that I want to learn more, understand the pros and cons of other
approaches *not* based on the principles of RMI etc. Let us put it all on
the table, together struggle for a clear picture which lets the basic
principles emerge as such - recognize the contours. Let us cooperate
towards a goal - to create the means that make the transition between tree
and graph data as natural and simple as we can imagine. Thus making the
transition between tree and graph thinking frequent and natural.

Kind regards,
Hans-Jürgen


Am Do., 24. Feb. 2022 um 05:37 Uhr schrieb Paul Tyson <phtyson@sbcglobal.net
>:

> Hans-Jürgen,
>
> You don't say whether your interests are academic or practical. I would
> guess academic, because there is no shortage of tools and standards for
> converting non-RDF data to RDF, as indicated by the responses to your
> original post. I don't know what your criteria for "declarative" mapping
> is, but for practical purposes I would settle for functional (or at least
> non-procedural) mappings, such as XSLT or SPARQL. If you want to go more
> declarative (and I have), you can resort to a generic rule language such as
> RIF[1]. But, to be honest, translating one surface syntax to another is not
> the really interesting problem--at least not to me after doing it for 3
> decades.
>
> But...you are interested in translating the "semantics" hidden beneath the
> surface syntax. In reality, this is asking to spin straw into gold, and
> yes, if anyone could do that it would be groundbreaking. But the best way
> toward this was pointed 25 years ago by the DSSSL ISO spec[2], "Document
> Style Semantics and Specification Language". Few on this list will remember
> the *annus mirabilis* 1997, when HyTime 2nd edition (ISO/IEC
> 10744:1997)[3] was published, which was the capstone of a family of text
> processing specifications that included DSSSL (ISO/IEC 10179:1996), and
> SGML (ISO 8879:1986). HyTime and DSSSL died aborning; SGML was quickly
> pushed aside by XML. Simplify, simplify, simplify! was the battle cry. RDF
> in 1997 was just being formalized as a simple metadata format for HTML
> pages. Much of what was good in those ISO specs was reincarnated in the XML
> family of W3C recommendations. Much of what was better was left behind, or
> imperfectly reinvented in fragments.
>
> One of the better features was the "grove" concept and formalization, used
> by both DSSSL and HyTime. DSSSL recognized that what should be styled (or
> transformed) of a document is not the surface syntax, but a collection of
> properties extracted from the surface syntax. These collections of
> properties were called "groves" (anecdotally, Graph Representation Of
> Values). The specification of groves turned out to be terribly complex,
> couched in ISO spec-ese, and burdened with antique SGML formalisms, which
> no doubt contributed to its nearly universal neglect. But perhaps it was
> just ahead of its time and constrained by the need to conform to SGML. I
> believe any "breakthrough" in unlocking intellectual assets from arbitrary
> bytestreams for transformation or styling will use something that looks a
> lot like groves--maybe even groves expressed in RDF.
>
> Regards,
>
> --Paul
>
> [1] http://www.w3.org/TR/rif-primer/
>
> [2] http://www.jclark.com/dsssl/
>
> [3] https://hytime.org
> On 2/23/22 18:00, Hans-Jürgen Rennau wrote:
>
> My cordial thanks for this wealth of responses which I had not dared to
> hope for! It will take me time to look at all these projects and products,
> which ideally would find their places in a single and coherent picture.
> Sometimes I shall ask a question concerning a particular approach or
> statement.
>
> A focus of mine will be on the question which approaches may be classified
> as "declarative mapping languages", borrowing the term from [1] (slide 33).
> I am sure that tools *not* qualifying as such may in particular scenarios
> be a superior choice, but on the large scale I think it is declarative
> mapping languages where the highest potential, perhaps even groundbreaking
> success is too be expected.
>
> For example, "resource specific XSLT scripts" (mentioned by Christian) may
> be very efficient (as pointed out by Martynas), but they are not
> declarative. And I suppose, the same applies to TARQL, but I may be
> mistaken and will try to check. Beneath the variability of external
> characteristics there may also be basic differences of perspective, as
> hinted at by a quote from Enrico [2]: " Proposals focus on either
> engineering content transformations or accessing non-RDF resources with
> SPARQL. ... we explore an alternative solution and contribute a
> general-purpose meta-model for converting non-RDF resources into RDF:
> Facade-X."
>
> With kind regards,
> Hans-Jürgen
>
>
> [1] Maria-Esther Vidal, Tutorial on "Challenges for Efficiently Creating
> and Maintaining Knowledge Graphs".
> https://service.tib.eu/ldmservice/dataset/sdmkgc
> [2]   https://doi.org/10.3233/ssw210035
>
> Am Mi., 23. Feb. 2022 um 08:10 Uhr schrieb Hans-Jürgen Rennau <
> hjrennau@gmail.com>:
>
>> Hello,
>>
>> I am interested in the transformation of non-RDF data into RDF data and I
>> am puzzled, nay, haunted by a simple analogy. We have stylesheets for
>> defining visual representation of data in a convenient, standardized way.
>> Could we not have "semsheets" for defining semantic representation of data
>> in a convenient, standardized way?
>>
>> I admit the oversimplification: CSS stylesheets are designed to work with
>> HTML, a scope sufficient for practical purposes. Whereas "non-RDF data" is
>> by definition a broad spectrum of media types, so the uniformity of a
>> single "semsheet language" may not be attainable. But how about approaching
>> the goal, based on an appropriate partitioning of data sources? For example:
>>
>> (1) Relational data
>> (2) Tree-structured data
>> (3) Other
>>
>> Tree-structured data comprises most structured data except for graph data
>> - JSON, XML, HTML, CSV, .... And concerning "other", what comes to my mind
>> is (i) unstructured text and (ii) non-RDF graph data.
>>
>> So keeping this partitioning in mind, how about standards, frameworks,
>> tools enabling customized mapping of data to RDF?
>>
>> What I am aware of is very little:
>>
>> (1) relational data: R2RML [1], ?
>> (2) tree-structured data: RML [2], ?
>> (3) other: ?
>>
>> Note that I did not mention RDFa, as it is about embedding, rather than
>> writing mapping documents, nor GRDDL, as it is about finding a mapping
>> document, not its content.
>>
>> I am convinced that there are quite a few other standards, frameworks and
>> tools which should be listed above, replacing the "?".
>>
>> Can you help me to find them? Any links, thoughts, comments highly
>> appreciated. (And should you think the partitioning is faulty, please share
>> your criticism. The same applies to the very quest for common, standardized
>> mapping languages.)
>>
>> Thank you! With kind regards,
>> Hans-Jürgen Rennau
>>
>> [1] https://www.w3.org/TR/r2rml/
>> [2] https://rml.io/specs/rml/
>>
>
Received on Friday, 25 February 2022 07:42:31 UTC