WG: RAX - requirements and tools from Dirschl, Christian on 2016-07-07 (public-rax@w3.org from July 2016)

From: Dirschl, Christian <Christian.Dirschl@wolterskluwer.com>
Date: Thu, 7 Jul 2016 08:51:26 +0000
To: "public-rax@w3.org" <public-rax@w3.org>
Message-ID: <BY2PR06MB011425F325A9A24D3A9D2E5F73B0@BY2PR06MB011.namprd06.prod.outlook.com>
Dear group,

as agreed in preparation on our next call tomorrow, I want to share with everybody the two detailed inputs we received so far. So here is number one from Bernát.

I will also send out an agenda later on today.

Best
Christian

Von: Bernát Kalló [mailto:bernat.kallo@z-bible.org]
Gesendet: Dienstag, 21. Juni 2016 16:14
An: Dirschl, Christian <Christian.Dirschl@wolterskluwer.com>; phil.ritchie@vistatec.com
Betreff: RAX - requirements and tools

Hello,

I've recently joined the RAX group, and seen what happened during the first conference call. Thanks for taking the responsibility of chairing the group. Now I'm sending you my requirements and results in the RDF-XML conversion topic.

At z-Bible.org, we are developing a digital Bible, which belongs to a new kind of book format that we call z-book. You can navigate in a z-book by zooming and panning, instead of turning pages, similarly to a zoomable map. z-Bible consists of the whole text of the Bible laid out on an infinite zoomable canvas, togehter with many types of additional materials: commentaries, notes, headings, background information, maps, cross-references, illustrations etc. All these pieces of information are assembled from modules, which are individually replacable (e.g. you can change the set of illustrations to anoter one, or choose another commentary to display alongside the scripture). We need a tooling to seamlessly assemble these modules run-time into a single dataset that describes the appearance of the zoomable canvas.

Until now, we only used XML files for storing the modules' contents, in a separate XML format for each type of module. We wrote code for reading these files (separate code for each type) and for assembling them. The assembling code also reads some configuration from an XML file (called the Layout), which describes how to visually display information from the modules and how to visually arrange them.

However, this system is not good enough, because now, introducing a new module type requires modifying the app code. We would need a solution where introducing a new type of module would only involve configuration and not coding (or at most coding in a DSL).

So we would need
1. a user-friendly format that can describe the contents of the modules
2. a way to combine related information from different modules
3. a user-friendly format that can describe how to lay out this info on the zoomable canvas

It was two weeks ago that I realized that RDF may be the best technology for combining the pieces of information in the modules. No. 1. could be some XML format plus a converter from XML to RDF (because the Turtle format is not comfortable enough for us), No.2. could be SPARQL. No. 3. could be a converter from SPARQL result sets to our XML format for describing the map.

So I have been hunting for converters for XML to RDF and vice versa, but I found very few. The best I found for XML -> RDF was Gloze, and the best for RDF -> XML was SPARQL Web Pages. Both of them are fairly straightforward and user-friendly, but not good enough for me. So currently, I'm planning be using some custom mapping for both of them, until hopefully the RAX group can come up with something better and standard.

So, about my actual formats.

XML->RDF:
While Gloze uses an XML schema for mapping from XML to RDF (and backwards too), I'm using the XML file only, with some conventions. I first convert it to JSON-LD and from that to RDF. A module that connects the Bible with Wikipedia articles would look like this (OSIS is a standard for referencing portions of the Bible):

<relatedInfo
  xmlns="http://z-bible.org/"
  xmlns:osis="http://z-bible.org/osis/">
  <resource id="http://en.wikipedia.org/wiki/Bible">
    <relatedTo ref="osis:Gen-Rev"/>
    <title>Bible</title>
  </resource>

  <resource id="http://en.wikipedia.org/wiki/Book_of_Genesis">
    <relatedTo ref="osis:Gen"/>
    <title>Book of Genesis</title>
  </resource>

</relatedInfo>

The following conventions apply:
- XML namespaces are equivalent to JSON-LD terms
- Tag names are predicates.
- Elements with child elements represent objects. These automagically receive an rdf:type from the tag name (unless a type attribute is provided). This is good because then we can write a more user-friendly { ?r a :resource } instead of { [] :resource ?r }.
- Objects automatically receive an index, so that the order of elements is persisted. This seemed to be better than many other RDF collection formats, because then a simple { ?x :index ?index } ORDER BY { ?index } can bring them back in correct order.
- Attribute id is analogous to the JSON-LD @id
- Attribute ref can be used to specify predicates whose object is an IRI. Elements with this attribute must be empty.
- Attribute type is analogous to the JSON-LD @type. This works for simple values and objects too, in the latter case it overrides the type inferred from the tag name.
- No other attributes are allowed (xmlns and xmlns:* prefix definitions are allowed only on the root node).
- Embedded XHTML is allowed (with proper xmlns) and is converted to its string representation, with some rdf:type different from string.

So this is mapped to the following JSON-LD:

{
    "resource": [{
        "@id": "http://en.wikipedia.org/wiki/Bible",
        "relatedTo": "osis:Gen-Rev",
        "title": "Bible",
        "@type": "resource",
        "index": 1
    }, {
        "@id": "http://en.wikipedia.org/wiki/Book_of_Genesis",
        "relatedTo": "osis:Gen",
        "title": "Book of Genesis",
        "@type": "resource",
        "index": 2
    }],
    "@context": {
        "resource": "http://z-bible.org/resource",
        "relatedTo": {
            "@id": "http://z-bible.org/relatedTo",
            "@type": "@id"
        },
        "title": "http://z-bible.org/title",
        "index": "http://z-bible.org/index",
        "osis": "http://z-bible.org/osis/",
        "@base": "http://z-bible.org/"
    },
    "@id": "relatedInfo"
}


RDF -> XML.

I actually need an XML templating language, with capabilities of executing SPARQL queries. I thought about something very 'dry' XML-based, just like XSLT (no special chars only for XML tags):

<ul>
  <rx:for-each where="[:name ?x] a :person">
    <li><rx:value-of select="?x" /></li>
  </rx:for-each>
</ul>

This is very clean but the <rx:value-of select="?x"/> would be too complicated for a non-programmer author.
Or I thought about something like Handlebars/Mustache:

<ul>
  {{#each [:name ?x] a :person}}
    <li>{{?x}}</li>
  {{/each}}
</ul>

This is very concise but it is not XML. And the nested {{# }}s and <>s can confuse non-programmers. But anyways, it is nice. However, to implement this, I need to modify the Handlebars parser, which is not the cheapest solution.

There is SPARQL Web Pages, which is XML, however, it allows embedding values in {= }:

<ul>
  <swp:for-each resultSet="{# SELECT ?x WHERE { [:name ?x] a :person } }">
    <li>{=?x}</li>
  </swp:for-each>
</ul>

This is fairly readable, however, has a bit too many unnecessary code (resultSet, and the many punctuation characters after). Finally I would go for a combination of the first and the last one:

<ul>
  <rx:for-each where="[:name ?x] a :person">
    <li>{=?x}</li>
  </rx:for-each>
</ul>

This format fits best my purposes of the four. However, I think it is not in line with the W3C style for formats, because this requires special strings ( the {= }, which then somehow would need to be escaped, for which there is not a universal support etc.) So I believe that my first (XSLT-based) example could be more suitable for a standard format.

So this was my not-so short introduction to my current work in the topic. Thanks for reading it, and thanks for doing this work for us.

Kind regards,
Bernát
Received on Thursday, 7 July 2016 08:52:09 UTC