RE: Uniprot RDF in RDF Gateway from Geoff Chappell on 2005-05-12 (public-semweb-lifesci@w3.org from May 2005)

From: Geoff Chappell <gchappell@intellidimension.com>
Date: Thu, 12 May 2005 11:07:03 -0400
To: "'Eric Jain'" <Eric.Jain@isb-sib.ch>
Cc: <public-semweb-lifesci@w3.org>
Message-ID: <005a01c55704$41f8d780$6401a8c0@gsclaptop>

> -----Original Message-----
> From: Eric Jain [mailto:Eric.Jain@isb-sib.ch]
> Sent: Thursday, May 12, 2005 10:08 AM
> To: Geoff Chappell
> Cc: public-semweb-lifesci@w3.org
> Subject: Re: Uniprot RDF in RDF Gateway
> 
> Geoff Chappell wrote:
> > Will do. In the meantime, you're welcome to pass any queries along to me
> and
> > I'll add them as examples.
> 
> To start with, have a look at the queries listed here:
> 
> http://www.isb-sib.ch/~ejain/expasy4j-webng/query.html

I'll duplicate those and add them to examples.

> If your system can handle GROUP BY, this opens up a lot of possibilities,
> e.g.:

Yes, we support aggregate functions (count, sum, avg, min, max, listof) and
groupby.

> - Number of proteins by organism.
> - What is the most frequently cited paper?
> - Who is the most frequently cited author?
> - What databases do we reference, and how often?

I'll take a look at these also.

> Some quick questions:
> 
> - Do you support boolean operators apart from AND?

Yes - OR, NOT (negation as failure) 

> - Can you do optional matching (e.g. return the rdfs:label value, if
> present)?

Yes. We have a construct that can be used for optionals, defaults, etc. -
e.g.:

Select ?c ?l using mydata where {[rdf:type] ?c [rdfs:Class]}
	and switch(?c)(
		case {[rdfs:label] ?c ?label} and lang(?label)='en':
			?l=?label
		case {[rdfs:label] ?c ?label}:
			?l=?label
		default:
			?l="n/a"
	)

Of course you can also simulate an optional (in the SPARQL sense) with X OR
NOT X.

> - Do you support '<' and '>'? On dates?

Yes, though currently you must use the between operator to make use of index
(we'll likely push >, < down into index lookups as well in the future). The
version of our software you can download now only supports these ops on our
native datatypes, but the soon-to-be-released version supports them on all
appropriate xmlschema datatypes as well.

> - Transitivity is really important. How is this implemented? Do you
> actually generate and store the inferred triples?

They're generated on the fly according to rules included in the query (e.g.
see example 5) as necessary to answer the query. You may choose to actually
insert those inferred triples into the db, of course. At the core of RDF
Gateway is a deductive database with rules-based inference. We provide
standard rulebases for owl and rdfs - though they'd need some customization
to work efficiently on a database of this size (e.g. certain general rules
could be compiled into specific rules for a particular ontology in a effort
to keep reasoning local, rather than global). 

> I'm also interested if you found any aspects of our data difficult to deal
> with. (Apart from the size :-)

One surprise was that though the ontology specifies range of props to be
xmlschema datatypes, in the files they're actually plain literals. That
caused a re-load when I realized that (because I wanted them to be typed so
they'd be indexed as such). I've also discovered that some of the dates
didn't parse coming in (when treated as xsd:dates) - I'm not sure yet if
that's your issue or mine. I'll resolve that (possibly by making my date
parser less strict) and reload. Otherwise so far so good :-)

-Geoff

Received on Thursday, 12 May 2005 15:07:43 UTC