Re: semsheets from Christian Chiarcos on 2022-02-23 (semantic-web@w3.org from February 2022)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Wed, 23 Feb 2022 12:17:33 +0100
To: Hans-Jürgen Rennau <hjrennau@gmail.com>
Cc: semantic-web@w3.org
Message-ID: <CAC1YGdgf+8xapAg9-G5-HGrTF4KRoJpwqqK_z4yQfuNs4QMdng@mail.gmail.com>
Hi,

as far as CSV data is concerned, TARQL (https://tarql.github.io/) is a
great tool as it allows you to do transformations with SPARQL, and whatever
relational data you have, it can be trivially exported to CSV. In that
case, no specific standard (other than SPARQL) needed.

For tree/XML, I guess most people just resort to XSL. It is possible, of
course, to use a generic XSL template to just encode the XML data model in
RDF and then run SPARQL updates over that. But this isn't ideal because the
raw RDF dump is too raw, so I guess we won't have a fully generic
alternative to resource-specific XSL scripts any time soon.

For tree/JSON, JSON-LD contexts are more or less what you're asking for.
Wrt. XML conversion, you can also convert XML to JSON and then provide the
contexts.

For plain text, there are some extractor frameworks, but an easy stylesheet
isn't feasible, as you need to configure language-specific processing
modules.

NB: We are currently in the process of bundling a number of converter
frameworks and subsequent SPARQL transformations into compact workflows,
see https://github.com/Pret-a-LLOD/Fintan (still in progress, final release
by end of June this year). Our specific goal is to apply this to data in
NLP, but general-purpose converters are included as well, so you can at
least run the XSL+SPARQL and TARQL/CSV+SPARQL transformations.

Best,
Christian

Am Mi., 23. Feb. 2022 um 08:10 Uhr schrieb Hans-Jürgen Rennau <
hjrennau@gmail.com>:

> Hello,
>
> I am interested in the transformation of non-RDF data into RDF data and I
> am puzzled, nay, haunted by a simple analogy. We have stylesheets for
> defining visual representation of data in a convenient, standardized way.
> Could we not have "semsheets" for defining semantic representation of data
> in a convenient, standardized way?
>
> I admit the oversimplification: CSS stylesheets are designed to work with
> HTML, a scope sufficient for practical purposes. Whereas "non-RDF data" is
> by definition a broad spectrum of media types, so the uniformity of a
> single "semsheet language" may not be attainable. But how about approaching
> the goal, based on an appropriate partitioning of data sources? For example:
>
> (1) Relational data
> (2) Tree-structured data
> (3) Other
>
> Tree-structured data comprises most structured data except for graph data
> - JSON, XML, HTML, CSV, .... And concerning "other", what comes to my mind
> is (i) unstructured text and (ii) non-RDF graph data.
>
> So keeping this partitioning in mind, how about standards, frameworks,
> tools enabling customized mapping of data to RDF?
>
> What I am aware of is very little:
>
> (1) relational data: R2RML [1], ?
> (2) tree-structured data: RML [2], ?
> (3) other: ?
>
> Note that I did not mention RDFa, as it is about embedding, rather than
> writing mapping documents, nor GRDDL, as it is about finding a mapping
> document, not its content.
>
> I am convinced that there are quite a few other standards, frameworks and
> tools which should be listed above, replacing the "?".
>
> Can you help me to find them? Any links, thoughts, comments highly
> appreciated. (And should you think the partitioning is faulty, please share
> your criticism. The same applies to the very quest for common, standardized
> mapping languages.)
>
> Thank you! With kind regards,
> Hans-Jürgen Rennau
>
> [1] https://www.w3.org/TR/r2rml/
> [2] https://rml.io/specs/rml/
>
Received on Wednesday, 23 February 2022 11:17:58 UTC