RE: semsheets from Armando Stellato on 2022-02-23 (semantic-web@w3.org from February 2022)

From: Armando Stellato <stellato@uniroma2.it>
Date: Wed, 23 Feb 2022 15:42:03 +0000
To: David Chaves <dchaves@fi.upm.es>, Martynas Jusevičius <martynas@atomgraph.com>
CC: Christian Chiarcos <christian.chiarcos@web.de>, Hans-Jürgen Rennau <hjrennau@gmail.com>, Semantic Web <semantic-web@w3.org>
Message-ID: <AS8PR09MB4982E82A61C238854B50C9FBC73C9@AS8PR09MB4982.eurprd09.prod.outlook.com>
Dear all,

I report also on another, layered, solution.

CODA [1] (Computer-aided Ontology Development Architecture) is an architecture and an associated Java framework for the RDF triplification of UIMA [2] results from analysis of unstructured content.
The purpose of CODA is to support the entire process embracing data extraction and transformation, identity resolution up to feeding semantic repositories with knowledge extracted from unstructured content. The motivation behind CODA lies in the large effort and design issues required for developing RDF compliant knowledge acquisition systems on top of well-established content analytics frameworks such as UIMA<http://uima.apache.org/> and GATE<https://gate.ac.uk/>. Therefore, CODA extends UIMA with facilities and a powerful language - PEARL<http://art.uniroma2.it/coda/documentation/pearl.jsf> [3] - for projection and transformation of UIMA annotated content into RDF<http://www.w3.org/RDF/>.

By relying on UIMA, and thus on the conversion of Feature Structures to RDF, CODA embraces any general knowledge acquisition process from any kind of source (structured, semistructured, unstructured, such as NLP processors over text etc..) where data can be first extracted and organized into simple extraction patterns, by then converting them into any target RDF vocabulary, with no restriction on the complexity of the target model.

A byproduct of CODA is Sheet2RDF [4]. In Sheet2RDF, spreadsheets are statically converted into a predefined UIMA feature structure and then, by writing a PEARL transformation, it is possible to transform the content into any RDF graph.
There are shortcuts though to writing a PEARL transformation:

  *   A UI Wizard facilitating the creation of the transformation. The step from the UI Wizard to PEARL is not invertible, but the status of the UI Wizard process can also be persisted, so it is still possible to recover the work done within the Wizard and bring modifications to it
  *   Convention over configuration: it is possible to use conventions in the headers of the spreadsheets (e.g. trivially, name of predicates, even using qnames, and other conventions for indicating, e.g. the language for literals to be created) so to pre-fill already the wizard, possibly completely, with no PEARL code to write nor wizard action to take. To enhance tool interoperability at the data level, these conventions have been aligned to those of SKOS Play, the tool mentioned in a previous email by Thomas Francart.

Sheet2RDF is available both as a command-line tool, or integrated into the UI of VocBench [5] (the above mentioned wizard applies to this case), a collaborative editing environment for OWL ontologies, SKOS(/XL) thesauri, Ontolex lexicons and RDF datasets in general

One more thing: a future version of Sheet2RDF will address multiple sheets, for those formats allowing for them (e.g. Excel and Open Spreadsheet formats) and connection to databases. ETA: May.


About the more general vision of a transformation model for anything to RDF, as I mentioned before, feature structures can act as a good model-in-the-middle when moving from any source to RDF. It is pretty immediate to consider static transformations from many sources (extraction templates, or even data formats, such as XML, JSON etc..) into FS and then operate more complex and dynamic (e.g. depending on the target vocabulary) transformations from FS to RDF, for which a transformation language is needed.
A few years ago I considered the idea of creating a W3C community group about it ( name: FS2RDF ? ) , possibly informing the initial work with our experience with PEARL (but not committing necessarily to anything of it). Then, due to other projects and research activities, I had no time to bring it ahead. If anybody think it is an interesting direction, maybe it’s time to do it :-)

Kind Regards,

Armando

[1] http://art.uniroma2.it/coda/

[2] https://uima.apache.org/

[3] http://art.uniroma2.it/coda/documentation/pearl.jsf (note: it’s not PERL, it’s PEARL, another language for model transformation, in particular, FS->RDF)
[4] http://art.uniroma2.it/sheet2rdf/

[5] http://vocbench.uniroma2.it/


P.S. some papers about the above platforms and tools:


  *   M. Fiorelli, M.T. Pazienza, A. Stellato and A. Turbati CODA: Computer-aided ontology development architecture, IBM Journal of Research and Development, doi:10.1147/JRD.2014.2307518, 58, 2, 1-12, March, 2014
  *   Ferrucci, D. et al. (2009) Unstructured Information Management Architecture (UIMA) Version 1.0. OASIS Standard, March 2009
  *   Maria Teresa Pazienza, Armando Stellato and Andrea Turbati , PEARL: ProjEction of Annotations Rule Language, a Language for Projecting (UIMA) Annotations over RDF Knowledge Bases, International Conference on Language Resources and Evaluation (LREC 2012)
  *   Manuel Fiorelli, Tiziano Lorenzetti, Maria Teresa Pazienza, Armando Stellato, Andrea Turbati,  Sheet2RDF: a Flexible and Dynamic Spreadsheet Import&Lifting Framework for RDF, Current Approaches in Applied Artificial Intelligence, doi:10.1007/978-3-319-19066-2_13, (Ali, Moonis and Kwon, Young Sig and Lee, Chang-Hwan and Kim, Juntae and Kim, Yongdai eds.), Lecture Notes in Computer Science, 9101, 131-140, Springer International Publishing, 2015
  *   Armando Stellato, Manuel Fiorelli, Andrea Turbati, Tiziano Lorenzetti, Willem Gemert, Denis Dechandon, Christine Laaboudi-Spoiden, Anikó Gerencsér, Anne Waniart, Eugeniu Costetchi and Johannes Keizer VocBench 3: A collaborative Semantic Web editor for ontologies, thesauri and lexicons, Semantic Web, doi:10.3233/SW-200370, 1-27, 05, 2020





From: David Chaves <dchaves@fi.upm.es>
Sent: Wednesday, February 23, 2022 1:07 PM
To: Martynas Jusevičius <martynas@atomgraph.com>
Cc: Christian Chiarcos <christian.chiarcos@web.de>; Hans-Jürgen Rennau <hjrennau@gmail.com>; Semantic Web <semantic-web@w3.org>
Subject: Re: semsheets

Hi,

Under the W3C Community Group on Knowledge Graph Construction (https://w3id.org/kg-construct), we are making the effort to collect all these tools/languages specifications/resources for performing these kinds of tasks (currently, it is mainly based on R2RML and its extensions but of course, it's open to any other tool).

You should visit https://github.com/kg-construct/awesome-kgc-tools to have an overview of all of them, and we are open to receive updates/PR from anyone! Additionally, for more detailed information on each resource, you may also take a look at https://github.com/kg-construct/resources


Best regards,
David

[UNIVERSIDAD POLITÉCNICA DE MADRID]<https://www.upm.es/>

David Chaves-Fraga
Postdoctoral Researcher
Escuela Técnica Superior de Ingenieros Informáticos
Ontology Engineering Group
Calle de Los Ciruelos S/N.
28660, Boadilla del Monte, Madrid SPAIN
✉ david.chaves@upm.es<mailto:david.chaves@upm.es>
✆ +34 627 31 72 15<tel:+34-627-31-72-15>
Aviso / Disclaimer<https://www.upm.es/disclaimer> 🌳 🌳 Piensa antes de imprimir.


On 23 Feb 2022, at 12:27, Martynas Jusevičius <martynas@atomgraph.com<mailto:martynas@atomgraph.com>> wrote:

Hi,

For CSV, in addition to TARQL there's also CSV2RDF which uses a
slightly different processing model and query forms:
https://github.com/AtomGraph/CSV2RDF


For XML, definitely XSLT, version 3.0 if it's available to you. With
it you can even do streaming transformations, which is pretty much
impossible otherwise.

Martynas
atomgraph.com

On Wed, Feb 23, 2022 at 12:22 PM Christian Chiarcos
<christian.chiarcos@web.de<mailto:christian.chiarcos@web.de>> wrote:


Hi,

as far as CSV data is concerned, TARQL (https://tarql.github.io/) is a great tool as it allows you to do transformations with SPARQL, and whatever relational data you have, it can be trivially exported to CSV. In that case, no specific standard (other than SPARQL) needed.

For tree/XML, I guess most people just resort to XSL. It is possible, of course, to use a generic XSL template to just encode the XML data model in RDF and then run SPARQL updates over that. But this isn't ideal because the raw RDF dump is too raw, so I guess we won't have a fully generic alternative to resource-specific XSL scripts any time soon.

For tree/JSON, JSON-LD contexts are more or less what you're asking for. Wrt. XML conversion, you can also convert XML to JSON and then provide the contexts.

For plain text, there are some extractor frameworks, but an easy stylesheet isn't feasible, as you need to configure language-specific processing modules.

NB: We are currently in the process of bundling a number of converter frameworks and subsequent SPARQL transformations into compact workflows, see https://github.com/Pret-a-LLOD/Fintan (still in progress, final release by end of June this year). Our specific goal is to apply this to data in NLP, but general-purpose converters are included as well, so you can at least run the XSL+SPARQL and TARQL/CSV+SPARQL transformations.

Best,
Christian

Am Mi., 23. Feb. 2022 um 08:10 Uhr schrieb Hans-Jürgen Rennau <hjrennau@gmail.com<mailto:hjrennau@gmail.com>>:


Hello,

I am interested in the transformation of non-RDF data into RDF data and I am puzzled, nay, haunted by a simple analogy. We have stylesheets for defining visual representation of data in a convenient, standardized way. Could we not have "semsheets" for defining semantic representation of data in a convenient, standardized way?

I admit the oversimplification: CSS stylesheets are designed to work with HTML, a scope sufficient for practical purposes. Whereas "non-RDF data" is by definition a broad spectrum of media types, so the uniformity of a single "semsheet language" may not be attainable. But how about approaching the goal, based on an appropriate partitioning of data sources? For example:

(1) Relational data
(2) Tree-structured data
(3) Other

Tree-structured data comprises most structured data except for graph data - JSON, XML, HTML, CSV, .... And concerning "other", what comes to my mind is (i) unstructured text and (ii) non-RDF graph data.

So keeping this partitioning in mind, how about standards, frameworks, tools enabling customized mapping of data to RDF?

What I am aware of is very little:

(1) relational data: R2RML [1], ?
(2) tree-structured data: RML [2], ?
(3) other: ?

Note that I did not mention RDFa, as it is about embedding, rather than writing mapping documents, nor GRDDL, as it is about finding a mapping document, not its content.

I am convinced that there are quite a few other standards, frameworks and tools which should be listed above, replacing the "?".

Can you help me to find them? Any links, thoughts, comments highly appreciated. (And should you think the partitioning is faulty, please share your criticism. The same applies to the very quest for common, standardized mapping languages.)

Thank you! With kind regards,
Hans-Jürgen Rennau

[1] https://www.w3.org/TR/r2rml/
[2] https://rml.io/specs/rml/
Received on Wednesday, 23 February 2022 15:42:20 UTC