Re: RDF Triples in XML, named graphs from Eric Jain on 2004-02-12 (www-rdf-interest@w3.org from February 2004)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Thu, 12 Feb 2004 12:22:21 +0100
To: "rdf-interest" <www-rdf-interest@w3.org>
Cc: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>
Message-ID: <000b01c3f15a$7b4891d0$c300000a@caliente>

> http://www-uk.hpl.hp.com/people/jjc/tmp/trix.pdf

Interesting approach. I definitely agree there is a problem with the
current syntax. However, the main issue I see is that there are too many
strategies mixed into a single syntax. This is bound to be used by
everyone in quite different ways (see Perl, TMTOWTDI).

A consequence of the flexible syntax is that current parsers are way to
slow to be of any use when dealing with large amounts of data. By
restricting myself to a subset of the complete syntax, I was able to
write a parser that is a full order of magnitude faster than ARP
(distributed with Jena). Others may be able to do even better, provided
they don't reconsider and decide that the technology is neither suitable
nor worth the effort.

In our case it is important that the files we distribute can also be
used by people who are familiar with XML, but completely clueless about
RDF (the majority, today). Therefore, I'd rather not introduce terms
such as 'graph', 'triple' and 'literal' into the syntax. Grouping of
statements into what you call 'graphs' on the other hand is very useful
for people trying to map the data to objects. This task can however also
be simplified by requiring logical sets of statements to occur in
sequence, rather than being scattered throughout the file.

Interestingly, when presented with the choice of working with an XML or
an RDF/XML representation of the same data, our developers (somewhat
familiar with XML, not RDF) choose to use the RDF version (to my great
relief :-). The data is relatively complex, with lots of
cross-referencing, which the RDF/XML syntax can handle in a simple and
consistent way. See below.

Another issue is size. The RDF/XML data is currently not more than 20%
larger than plain XML. Using a syntax such as TriX on the other hand I
fear would increase the size by a factor of at least two, more than
acceptable.

In conclusion, what we need, I believe, is not a new syntax, but rather
something along the line of Simon St. Laurent's 'Common XML'
[http://www.simonstl.com/articles/cxmlspec.txt]; let's call it 'Common
RDF'...


<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://expasy.org/rdf-syntax-ns#"
>
  <rdf:Description rdf:about="urn:lsid:expasy.org:uniref:C50-Q10466">
    <rdf:type rdf:resource="&exp;Cluster"/>
    <rdfs:label>Titin, heart isoform N2-B related cluster</rdfs:label>
    <similarity>0.5</similarity>
    <gene rdf:ID="#_2" rdf:resource="#_1"/>
    <member rdf:resource="urn:lsid:expasy.org:uniprot:Q10466"/>
    <member rdf:resource="urn:lsid:expasy.org:uniprot:Q8TCG8"/>
    <member rdf:resource="urn:lsid:expasy.org:uniprot:Q15598"/>
    ...
  </rdf:Description>

  <rdf:Description rdf:about="#_1">
    <rdf:type rdf:resource="&exp;Gene"/>
    <rdfs:label>BRCA</rdfs:label>
    ...
  </rdf:Description>

  <rdf:Description rdf:about="#_2">
    <rdf:type rdf:resource="&exp;ExtendedStatement"/>
    <updated>2004-02-01</updated>
    ...
  </rdf:Description>

  <rdf:Description rdf:about="urn:lsid:expasy.org:uniref:C50-Q10467">
    <rdf:type rdf:resource="&exp;Cluster"/>
    ...
  </rdf:Description>

  ...

</rdf:RDF>

Received on Thursday, 12 February 2004 06:24:18 UTC