Re: semsheets from Christian Chiarcos on 2022-02-23 (semantic-web@w3.org from February 2022)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Wed, 23 Feb 2022 12:18:29 +0100
To: Hans-Jürgen Rennau <hjrennau@gmail.com>
Cc: semantic-web@w3.org
Message-ID: <CAC1YGdjjFh-k-hShW89mYSoUR-CWTBFh5DcLooVC3DPf58D3VQ@mail.gmail.com>
Am Mi., 23. Feb. 2022 um 12:17 Uhr schrieb Christian Chiarcos <
christian.chiarcos@web.de>:

> Hi,
>
> as far as CSV data is concerned, TARQL (https://tarql.github.io/) is a
> great tool as it allows you to do transformations with SPARQL, and whatever
> relational data you have, it can be trivially exported to CSV. In that
> case, no specific standard (other than SPARQL) needed.
>
> For tree/XML, I guess most people just resort to XSL. It is possible, of
> course, to use a generic XSL template to just encode the XML data model in
> RDF and then run SPARQL updates over that. But this isn't ideal because the
> raw RDF dump is too raw,
>

too verbose, I mean ;)


> so I guess we won't have a fully generic alternative to resource-specific
> XSL scripts any time soon.
>
> For tree/JSON, JSON-LD contexts are more or less what you're asking for.
> Wrt. XML conversion, you can also convert XML to JSON and then provide the
> contexts.
>
> For plain text, there are some extractor frameworks, but an easy
> stylesheet isn't feasible, as you need to configure language-specific
> processing modules.
>
> NB: We are currently in the process of bundling a number of converter
> frameworks and subsequent SPARQL transformations into compact workflows,
> see https://github.com/Pret-a-LLOD/Fintan (still in progress, final
> release by end of June this year). Our specific goal is to apply this to
> data in NLP, but general-purpose converters are included as well, so you
> can at least run the XSL+SPARQL and TARQL/CSV+SPARQL transformations.
>
> Best,
> Christian
>
> Am Mi., 23. Feb. 2022 um 08:10 Uhr schrieb Hans-Jürgen Rennau <
> hjrennau@gmail.com>:
>
>> Hello,
>>
>> I am interested in the transformation of non-RDF data into RDF data and I
>> am puzzled, nay, haunted by a simple analogy. We have stylesheets for
>> defining visual representation of data in a convenient, standardized way.
>> Could we not have "semsheets" for defining semantic representation of data
>> in a convenient, standardized way?
>>
>> I admit the oversimplification: CSS stylesheets are designed to work with
>> HTML, a scope sufficient for practical purposes. Whereas "non-RDF data" is
>> by definition a broad spectrum of media types, so the uniformity of a
>> single "semsheet language" may not be attainable. But how about approaching
>> the goal, based on an appropriate partitioning of data sources? For example:
>>
>> (1) Relational data
>> (2) Tree-structured data
>> (3) Other
>>
>> Tree-structured data comprises most structured data except for graph data
>> - JSON, XML, HTML, CSV, .... And concerning "other", what comes to my mind
>> is (i) unstructured text and (ii) non-RDF graph data.
>>
>> So keeping this partitioning in mind, how about standards, frameworks,
>> tools enabling customized mapping of data to RDF?
>>
>> What I am aware of is very little:
>>
>> (1) relational data: R2RML [1], ?
>> (2) tree-structured data: RML [2], ?
>> (3) other: ?
>>
>> Note that I did not mention RDFa, as it is about embedding, rather than
>> writing mapping documents, nor GRDDL, as it is about finding a mapping
>> document, not its content.
>>
>> I am convinced that there are quite a few other standards, frameworks and
>> tools which should be listed above, replacing the "?".
>>
>> Can you help me to find them? Any links, thoughts, comments highly
>> appreciated. (And should you think the partitioning is faulty, please share
>> your criticism. The same applies to the very quest for common, standardized
>> mapping languages.)
>>
>> Thank you! With kind regards,
>> Hans-Jürgen Rennau
>>
>> [1] https://www.w3.org/TR/r2rml/
>> [2] https://rml.io/specs/rml/
>>
>
Received on Wednesday, 23 February 2022 11:18:55 UTC