W3C home > Mailing lists > Public > public-rax@w3.org > March 2017

Converting XML-encoded texts to RDF and back [via RDF and XML Interoperability Community Group]

From: W3C Community Development Team <team-community-process@w3.org>
Date: Thu, 30 Mar 2017 16:51:45 +0000
To: public-rax@w3.org
Message-ID: <65e9d6869ea1239b681eec43029d6e4e@www.w3.org>
We created a generic research platform called Knora to be (primarily) used in
the humanities domain. Knora internally uses an RDF-triplestore and offers a
RESTful API to its users to perform all necessary operations (reading, creating,
updating, and deleting data). The Knora base ontology provides basic value types
designed for the representation of qualitative data, including versioning and
permissions.

An important part of data in the humanities are marked up texts (e.g., for
digital editions). Once imported into Knora, it is our goal to represent these
texts adequately in RDF and to export them if the user wishes to do so. At the
moment, we support the import of XML-encoded texts into Knora and their export
as XML. The export delivers an XML document that is equivalent to the
imported one (equivalent, but not necessarily identical on the character stream
level).

Before importing an XML document representing a text, a mapping has to be
provided. A mapping expresses the relations between XML elements and attributes
and their corresponding entities defined in ontologies (classes and properties).
With a mapping provided, XML documents can be converted to RDF and stored in
Knora's triplestore. During the conversion, markup and content are separated
since we use a so called standoff-based approach (referring to positions or
ranges of the text via index positions of single characters). The text is stored
as a string, the markup is represented as RDF-triples, allowing for SPARQL
queries.

Our goal is to develop an editor that allows for creating and editing texts
directly in a native standoff format. For now, we are still using embedded
markup (e.g., HTML in a browser-based GUI) that is converted to RDF and back,
limiting the advantages of the standoff apprach. One of the main advantages of
standoff is the ability to add layers of annotations to a text without
interfering with the existing markup (unlike as in embedded markup like
XML-based documents whee overlap may occur). Our approach is inspired by Desmond
Schmidt's work: http://ecdosis.net/papers/schmidt.d.2016.pdf

You will find more information about the creation and handling of standoff
markup in Knora here:

 	mapping XML to
RDF: http://www.knora.org/documentation/manual/rst/knora-api-server/api_v1/create-a-mapping.html
 	standoff entities defined in the Knora base
ontology: http://www.knora.org/documentation/manual/rst/knora-ontologies/knora-base.html#text-with-standoff-markup
 	tests that illustrate the use of the XML to standoff
conversion: https://github.com/dhlab-basel/Knora/blob/develop/webapi/src/test/scala/org/knora/webapi/e2e/v1/StandoffV1R2RSpec.scala

 

 

 



----------

This post sent on RDF and XML Interoperability Community Group



'Converting XML-encoded texts to RDF and back'

https://www.w3.org/community/rax/2017/03/30/converting-xml-encoded-texts-to-rdf-and-back/



Learn more about the RDF and XML Interoperability Community Group: 

https://www.w3.org/community/rax
Received on Thursday, 30 March 2017 16:51:52 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 30 March 2017 16:51:52 UTC