Re: RDF Triples in XML, named graphs from Patrick Stickler on 2004-02-12 (www-rdf-interest@w3.org from February 2004)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 12 Feb 2004 13:54:05 +0200
To: "ext Eric Jain" <Eric.Jain@isb-sib.ch>
Cc: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "rdf-interest" <www-rdf-interest@w3.org>
Message-Id: <285F735A-5D52-11D8-B830-000A95EAFCEA@nokia.com>

On Feb 12, 2004, at 13:22, ext Eric Jain wrote:

>
>> http://www-uk.hpl.hp.com/people/jjc/tmp/trix.pdf
>
> Interesting approach. I definitely agree there is a problem with the
> current syntax. However, the main issue I see is that there are too 
> many
> strategies mixed into a single syntax. This is bound to be used by
> everyone in quite different ways (see Perl, TMTOWTDI).
>
> A consequence of the flexible syntax is that current parsers are way to
> slow to be of any use when dealing with large amounts of data. By
> restricting myself to a subset of the complete syntax, I was able to
> write a parser that is a full order of magnitude faster than ARP
> (distributed with Jena). Others may be able to do even better, provided
> they don't reconsider and decide that the technology is neither 
> suitable
> nor worth the effort.
>
> In our case it is important that the files we distribute can also be
> used by people who are familiar with XML, but completely clueless about
> RDF (the majority, today).

Certainly one of the target groups that TriX is meant for.

> Therefore, I'd rather not introduce terms
> such as 'graph', 'triple' and 'literal' into the syntax.

The vocabulary of TriX was specifically intended to reflect the
official terminology used to describe the RDF graph syntax. One
of the challenges to folks learning RDF is that the vocabulary
used for RDF/XML does not sync with the abstract model of the
graph.

> Grouping of
> statements into what you call 'graphs' on the other hand is very useful
> for people trying to map the data to objects.

They are called 'graphs' because that's precisely what they are: RDF
graphs. TriX reflects the underlying graph model of RDF in as true
a fashion as possible, just as do NTriples. Hence "TriX" -> Triples
in XML.

> This task can however also
> be simplified by requiring logical sets of statements to occur in
> sequence, rather than being scattered throughout the file.
>
> Interestingly, when presented with the choice of working with an XML or
> an RDF/XML representation of the same data, our developers (somewhat
> familiar with XML, not RDF) choose to use the RDF version (to my great
> relief :-). The data is relatively complex, with lots of
> cross-referencing, which the RDF/XML syntax can handle in a simple and
> consistent way. See below.
>
> Another issue is size. The RDF/XML data is currently not more than 20%
> larger than plain XML. Using a syntax such as TriX on the other hand I
> fear would increase the size by a factor of at least two, more than
> acceptable.
>

I think you will find that introducing mechanisms for compression
of the expression of statements will result in either (a) variability
in representation, reducing the utility for tools such as XQuery,
and/or (b) complexity in parsing/output, reducing the utility for
tools such as XSLT, SAX, etc.

TriX is not intended to be used by humans. It is also not necessary
to explicitly serialize graphs as TriX, but simply to provide a
virtual interface to a knowledge base that allows generic XML tools
to view/search/manipulate the graph in terms of the TriX syntax.

In this way, the same XQuery could be executed against a knowledge
base and/or an actual TriX instance. One could then think of TriX
as a means of integrating RDF and XQuery.

Patrick

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com

Received on Thursday, 12 February 2004 06:54:01 UTC