RE: Implementing GRDDL in XProc from vojtech.toman@emc.com on 2010-11-22 (public-xml-processing-model-wg@w3.org from November 2010)

From: <vojtech.toman@emc.com>
Date: Mon, 22 Nov 2010 03:47:54 -0500
To: <public-xml-processing-model-wg@w3.org>
Message-ID: <997C307BEB90984EBE935699389EC41C02BC2544@CORPUSMX70C.corp.emc.com>

> >  To have a complete XProc GRDDL implementation, you
> > currently need to use an extension step that takes a sequence of RDF
> > graphs and returns the merged result.

> Could be done in XQuery or even XSLT though, no?  Or are you
> expressing a need for a SPARQL step?

That is the part I don't know about. I don't really know enough about
RDF (yet) to be able to say that the merge can be implemented in XQuery
or XSLT. But it certainly looks like a more complicated thing than just
using p:wrap-sequwnce + p:unwrap like Norm suggested:

"""
A merge of a set of RDF graphs is defined as follows. If the graphs in
the set have no blank nodes in common, then the union of the graphs is a
merge; if they do share blank nodes, then it is the union of a set of
graphs that is obtained by replacing the graphs in the set by equivalent
graphs that share no blank nodes. This is often described by saying that
the blank nodes have been 'standardized apart'. It is easy to see that
any two merges are equivalent, so we will refer to the merge, following
the convention on equivalent graphs. Using the convention on equivalent
graphs and identity, any graph in the original set is considered to be a
subgraph of the merge.

One does not, in general, obtain the merge of a set of graphs by
concatenating their corresponding N-Triples documents and constructing
the graph described by the merged document. If some of the documents use
the same node identifiers, the merged document will describe a graph in
which some of the blank nodes have been 'accidentally' identified. To
merge N-Triples documents it is necessary to check if the same nodeID is
used in two or more documents, and to replace it with a distinct nodeID
in each of them, before merging the documents. Similar cautions apply to
merging graphs described by RDF/XML documents which contain nodeIDs, see
RDF/XML Syntax Specification (Revised) [RDF-SYNTAX].

"""

At the moment, I ended up with a custom step (implemented using the JRDF
library) that takes a sequence of RDF documents and merges the
corresponding graphs into one.

I don't think a SPARQL step is necessary for implementing GRDDL in
XProc.

---

Btw, when working on the pipeline, it quickly became clear to me that
this is something that I really wouldn't want to implement as an atomic
p:grddl step. It really is a pipeline with a fair amount of conditional
logic, queries over the input data and requests for additional
resources. Implementing this in Java (or any other language) is
possible, but it seems unnatural to me when you have all the machinery
of XProc at hand.




--
Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group
vojtech.toman@emc.com
http://developer.emc.com/xmltech

Received on Monday, 22 November 2010 08:49:10 UTC