Re: Transforming RDF into (non-binary!) trees from Paul Tyson on 2014-07-06 (semantic-web@w3.org from July 2014)

From: Paul Tyson <phtyson@sbcglobal.net>
Date: Sun, 06 Jul 2014 14:27:18 -0500
To: Tim Berners-Lee <timbl@w3.org>
Cc: Victor Porton <porton@narod.ru>, SW-forum Web <semantic-web@w3.org>
Message-ID: <1404674838.2488.115.camel@aquinas.attlocal.net>
On Sun, 2014-07-06 at 17:14 +0100, Tim Berners-Lee wrote:
> On 2014-07 -06, at 16:46, Paul Tyson <phtyson@sbcglobal.net> wrote:
> 
> > On Sat, 2014-07-05 at 22:35 +0300, Victor Porton wrote:
> >> I think we should write some code which would transform RDF into a tree
> >> (not necessarily binary! utilize nameless nodes as nodes with N
> >> childs) and also check the number of branches of a certain kind
> >> (usually 0..1 or 1..1).
> >> 
> >> Has anyone done a similar job?
> > 
> > I have not done that in RDF, but recently I had to generate optimal
> > spanning trees [1] from a directed acyclic graph (DAG). It occurred to
> > me that a similar technique could be applied to RDF if you first omitted
> > cycles from the RDF graph (perhaps by introducing blank nodes).
> > 
> > One approach would be to put the spanning tree (however you choose to
> > define it) in one named graph, and all the other "non-tree" triples in
> > another named graph.
> > 
> > This would make it easier to apply conventional block-and-line layout
> > styles (using XSL or CSS) to the spanning tree, and use the non-tree
> > links to "decorate" the format (e.g. using hyperlinks or other
> > interactive behavior).
> > 
> > Your use case might be quite different than mine. I am motivated by the
> > problem of applying formatting style to RDF graphs. Since conventional
> > layout techniques for screen and paper have a tree-based target model
> > (pages/screens,blocks,lines,characters), somewhere in the process you
> > must find or make a tree from your graph-based data. By specifying how
> > to construct one or more useful (i.e., "meaningful for formatting")
> > spanning trees from a given RDF graph, you achieve greater flexibility
> > and transparency in the process.
> 
> Any serializer to turtle, etc, produces a tree in the process.
> 

I assumed the original poster wanted a spanning tree of the RDF graph,
not just a tree-like serialization of the RDF graph. This would require
omitting all but one triple from each set of triples that have the same
object.

> For example, the serializer in rdflib.js uses the same algorithm for
>  serializing turtle/N3, rdf/xml and also a form of graphical HTML
>  layout the tabulator project uses for a "data view" of rdf resource.
>  This latter also represents quoted graphs of N3 as rounded-corner
>  bubbles around the graph, and is useful for vizualising at rule files.
> https://github.com/linkeddata/rdflib.js and specifically
>  https://github.com/linkeddata/rdflib.js/blob/master/serialize.js for
>  the serializer and
>  https://github.com/linkeddata/tabulator/blob/master/js/panes/dataContentPane.js
> for the code which generates the graphical view.
> 
> 

Since it is trivial to construe any DAG as a tree, I did not think that
is what the original question was about. Rather, I took the question as:
"of all the possible spanning trees implicit in an RDF graph, are some
more useful (e.g., more 'meaningful') than others, and if so how best to
specify and construct them?". (It is quite likely I did not get the
question right.)

I interpreted the question thus because a problem that is looming in my
work is how to tame the "great blooming, buzzing confusion" that comes
at you from any nontrivial RDF query. Solutions such as Tabulator tame
the confusion by presenting the graph as linked hierarchical views of
property lists, which is fine for data geeks but not attractive or
optimal for many business uses. Custom queries and transformations can
provide effective interfaces but are tedious to build and maintain, and
can limit users' interaction with the data. By introducing the ability
to specify a meaningful spanning tree into the query-transform process
we get another control point with which to enrich and style the raw RDF
data for particular business purposes. We will also have provided a
declarative bridging mechanism between the web of data and the web of
documents (to the extent that our specified spanning trees are
"document-like").

Regards,
--Paul

> In general, a graph may have disconnected parts and so may have to be serialized to more than one tree.
> 
> (Note that if you allow N3's  reverse arc syntax  (   <#a> is :child of
>   <#b> ) the you can serialize any acyclic graph to turtle without
>  having to generate arbitrary identifiers for blank nodes, just using
>  the turtle [ ]  syntax.   That is one reason why it was a shame that
>  the reverse syntax was omitted from Turtle.    The serializer above
>  does not use the reverse link syntax in its output, so it generates a
>  tree of forward links.  This goes against a  maxim of mine that
>  forward links are not treated special over backward links in RDF...
>  but I digress.)
> 
> 
> 
> > 
> > I suppose such a system could be implemented with SPARQL, but it would
> > be nice to have a non-SPARQL declarative syntax for specifying the
> > spanning tree. RIF might work.
> > 
> > Regards,
> > --Paul
> > 
> > [1] http://en.wikipedia.org/wiki/Spanning_tree
> > 
> >> 
> >> I am working for bindings librdf for Ada2012. I could write such code
> >> directly in Ada (so it may be easier), but better would be to make C
> >> interface for this. I may write in Ada and leave TODO note "port it to
> >> C".
> >> 
> >> Any response?
> >> 
> >> --
> >> Victor Porton - http://portonvictor.org
> >> 
> > 
> > 
> > 
> > 
>
Received on Sunday, 6 July 2014 19:28:08 UTC