Re: Mapping XML to RDF from adasal on 2013-04-13 (semantic-web@w3.org from April 2013)

From: adasal <adam.saltiel@gmail.com>
Date: Sat, 13 Apr 2013 18:03:03 +0100
To: Tim Berners-Lee <timbl@w3.org>
Cc: Anastasia Dimou <natadimou@gmail.com>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <CANJ1O4ofwBSaBvkT4wV7454kgjZuVaa4748XZ4_y-sCRQuv5Wg@mail.gmail.com>
I wonder how that may work with larger, deeply nested and arbitrary XML.
I am working with
rdfstore-js : https://github.com/antoniogarrote/rdfstore-js

[JSON-LD]: http://json-ld.org/
[json-ld.org]: https://github.com/json-ld/json-ld.org
which is bundled as a dependancy in rdfstore-js

so we have from README.md

> rdfstore-js is a pure Javascript implementation of a RDF graph store with
> support for the SPARQL query and data manipulation language.
>     var rdfstore = require('rdfstore');
>
>     rdfstore.create(function(store) {
>       store.execute('LOAD <http://dbpedia.org/resource/Tim_Berners-Lee>
> INTO GRAPH <http://example.org/people>', function()

...

which is delightfully amusing, if you think about -:)

It saves directly into mongodb and everything can be setup locally using
npm. After set up it is a very good place to experiment with queries and
data formats.

Although I need the extra help of a tool to validate my SPARQL. I will use
TopLink or Protege.

rdfstore-js accepts and serves JSON (which is javascript - very useful), n3
and turtle.

Output is not necessarily the same as input as it is non-deterministic,
there may be some information loss.

I have no idea how this would play out for "normal" chaotic XML with
various, often inconsistant or idiosyncratic attribute usages.

The real problem with large XML datasets is their format can change from
time to time, or even in the midst of a single stream, which may have been
composed from XML built at different times when different usages pertained.

In this case there is nothing for it but to build your own (SAX?) event
parser or use XPATH with a lot of rules, or some such.

And it very well maybe that exceptions have to go into a bag, as no one can
anticipate them.

I believe all of this argues in favour of making the effort to lift XML
into RDF.


Best,

Adam


On 13 April 2013 10:48, Tim Berners-Lee <timbl@w3.org> wrote:

> Well Rdb and CSV are fundamentally tables and XML is a tree. In fact the
> job of mapping XML is equivalent to the job if mapping JSON.  Look at using
> JSON LD -- it provides a way of notating how each JSON element should be
> mapped to RDF, recursively.  Suppose you syntactically convert your XML
> into JSON and then make a @context definition for it. See whether it
> provides what you need.  Or write a JSON LD processor which takes XML.
>
>  JSON is the new XML :-)
>
> Tim
>
> Sent from my portable device.
>
> On Apr 12, 2013, at 3:39, Anastasia Dimou <natadimou@gmail.com> wrote:
>
> Thank you all for your replies regarding different tools.
> Correct me if I am wrong, but those tools do not use R2RML in order to map
> XML to RDF.
> So, let me put it in a different way: since R2RML can be used to map RDB
> to RDF and it can also fit the CSV to RDF conversion needs, then why not
> extending R2RML to fit XML (and, in the long term, different file formats)
> to RDF mappings, too? I do not ignore that there are standards for XML
> transformations, but I wonder if extending R2RML to map XML to RDF was
> considered.
>
> Best regards,
> Anastasia
>
>
>
> On Wed, Apr 10, 2013 at 6:32 PM, Silvio Peroni <essepuntato@cs.unibo.it>wrote:
>
>> Dear Anastasia,
>>
>> currently I investigate a way to convert XML files to RDF.
>> Using XSLT, XPath and Xqueries seems to be the straightforward solution.
>> But I was wondering if there is already an implementation that uses R2RML
>> (adjust/extended to XML needs) to convert XML to RDF.
>>
>>
>> I developed a tool called XML2EARMARK [1] that translates XML sources
>> into OWL ontologies conform with the EARMARK ontology [2].
>>
>> EARMARK [3] is "a meta-syntax for non-embedded markup that can be used
>> for stand-off annotations of textual content with fully
>> W3C-compliant technologies. EARMARK is based on an ontologically precise
>> definition of markup that instantiates the markup of a text document as
>> an independent OWL document outside of the text strings it annotates, and
>> through appropriate OWL and SWRL characterizations it can define structures
>> such as trees or graphs and can be used to generate validity constraints
>> (including co-constraints currently unavailable in most validation
>> languages)."
>>
>> You can use it directly within Java application by means of the EARMARK
>> API [4]. For more information, please visit [5].
>>
>> Please don't hesitate to contact me for any additional
>> question/information/doubt.
>>
>> Have a nice day :-)
>>
>> S.
>>
>>
>>
>> ### References ###
>> [1] - XML2EARMARK: http://www.essepuntato.it/xml2earmark
>> [2] - EARMARK ontology: http://www.essepuntato.it/2008/12/earmark
>> [3] - Di Iorio, A., Peroni, S., Vitali, F. (2011). A Semantic Web
>> Approach To Everyday Overlapping Markup. In Journal of the American Society
>> for Information Science and Technology, 62 (9): 1696-1716. Hoboken, New
>> Jersey, USA: John Wiley & Sons, Inc. DOI: 10.1002/asi.21591
>>
>> http://speroni.web.cs.unibo.it/publications/di-iorio-2011-semantic-approach-everyday.pdf
>> [4] - EARMARK API: http://earmark.sourceforge.net/
>> [5] - EARMARK information page: http://palindrom.es/phd/research/earmark/
>>
>>
>>
>>
>>
>> ----------------------------------------------------------------------------
>> Silvio Peroni, Ph.D.
>> Department of Computer Science and Engineering
>> University of Bologna, Bologna (Italy)
>> Tel: +39 051 2094871
>> E-mail: essepuntato@cs.unibo.it
>> Web: http://www.essepuntato.it
>> Blog: http://palindrom.es/phd
>> Twitter: essepuntato
>>
>>
>
>
> --
> Anastasia Dimou
>
>
Received on Saturday, 13 April 2013 17:03:35 UTC