Re: ANNOUNCE: W3C Workshop on Semantic Web for Life Sciences from Eric Jain on 2004-08-20 (public-semweb-lifesci@w3.org from August 2004)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Fri, 20 Aug 2004 12:32:50 +0200
To: Massimo Marchiori <massimo@w3.org>
CC: public-semweb-lifesci@w3.org
Message-ID: <4125D352.7010605@isb-sib.ch>
Massimo Marchiori wrote:
> 1. Let me rephrase what I think you are saying:
> writing a parser for the XML serialization syntax
> is quite hard. Is this it, or there is more, like
> eg user readability?

My main concern is not user readability (not a bad thing, of course), 
but I feel that things would be easier if there weren't several 
different ways of representing everything.

Consider:

<rdf:RDF>
   <rdf:Description rdf:about="#F">
     <rdf:type rdf:resource="#Foo"/>
   </rdf:Description>
</rdf:RDF>

Versa:

<rdf:RDF>
   <Foo rdf:ID="F"/>
</rdf:RDF>

The second version is arguably more readable, but can't handle as many 
cases as the first version, as it restricts the kind of identifiers that 
can by used. Therefore a parser must support both versions, and this is 
just one example. The more alternatives, the more complex the 
programming. And people must remember all the different rules, too. I 
therefore wonder if it would make sense to have a "Common RDF", 
following the philosophy of [http://www.simonstl.com/articles/cxmlspec.txt].


> 2. Clarification: do you mean collections/lists
> should not be in RDF, or just that the way they
> are represented now is not nice/effective? Either way,
> could you provide some motivation/example?

Being able to explicitly express that the order of several items is 
relevant doesn't seem like a bad idea. On the other hand, considering 
the tricks required to map collections/lists to triples, I wonder if it 
wouldn't be better to leave it up to applications whether or not they 
treat the ordering of certain items as relevant or not.

As an example I mention query engines. Few query engines have direct 
support for collections/lists [see 
http://www.aifb.uni-karlsruhe.de/WBS/pha/rdf-query/], so instead of 
"selecting all publications that have an author x", for example, you may 
have to "select all publications that have a list that contains an 
author x". Of course, sometimes you may want to "return the titles and 
the first author for all publications cited by y", for example. But 
should this functionality be limited to explicitly defined lists?

A further argument: Let's say we have a resource with a property "name". 
If we one day decide that there may in fact be several names, but don't 
want to loose the ordering, any applications that were previously 
looking for a "name" property will be broken, because the property has 
been moved into a list.


> 3. Interesting: do your use cases have absolute needs
> for reification, or it's just a convenience? Is
> your only use just use case 6 (provenance)?

Consider this example: A protein may occur in one or more organisms. We 
may need to indicate who observed this protein in a specific organism, 
and cite a relevant publication etc. This information obviously can't be 
attached to either the protein or the taxon resource. We could create 
intermediary resources for connecting proteins to taxa, but this seems 
unnatural and is impractical, because the same procedure would have to 
be repeated for many other properties. Also, no application should break 
because one day we decide to provide some provenance data for something 
that previously never had any.

In addition to provenance data (currently only available in our internal 
version, but likely to be made public at some point) we plan to allow 
our database curators to attach "post-it notes" in many of the same 
places (for internal use only).


 > And,
> when you talk about quads, do you mean the RDF model
> should be changed to allow context, or did you mean
> just that efficiency of reification handling is
> still an open issue and so suitable pseudo-forms
> could be developed to provide better tool processing?

By quads I meant (perhaps misusing the terminology) that when parsing 
something like

<rdf:Description rdf:about="P12345">
   <name rdf:ID="S1">Foo</name>
</rdf:Description>

with a statement-by-statement callback mechanism, most parsers will return:

P12345 name 'Foo'
S1 rdf:type rdf:Statement
S1 rdf:subject P12345
S1 rdf:predicate name
S1 rdf:object 'Foo'

Rather than:

S1: P12345 name 'Foo'

Which would be much simpler and more efficient to process, in my opinion.


> 4. yes, you're right. No negation, at least for
> the moment... (for a reason, as treatment of negation
> can be quite hairy...)

I remember hearing about this before. Perhaps someone with insight could 
explain where the hairiness arises?


> 5. Could you expand more on this? An example would
> shed some light.

For many applications it is useful (N3, RDQL, storage in relational 
database) or even required (Protege) that URIs (or some of them) can be 
separated into a namespace part and a QName.

The following URIs work well:

http://uniprot.org/uniprot/P12345
urn:lsid:uniprot.org:uniprot:P12345

But this one doesn't (9606 is not a QName):

urn:lsid:uniprot.org:taxonomy:9606

This is an irritation, though I'm not sure whom to blame :-)


> 6. So, I guess the critique isn't much to RDF per se
> but to Web Services, right? Or, are there specific
> features of RDF that you were thinking about?

Yes, this is more of a critique of web services in general and the 
difficulty of creating web services that send and receive RDF in particular.


> 7. Is is really that bad to use plain URI (ramping
> debate here, but I'm interested in your opinion.
> if it's for your point 11, then 11 could be solved
> and therefore using URIs... ;) ?

The problem is that currently most life science resources are only 
accessible via URLs such as 
http://www.bioinf.man.ac.uk/cgi-bin/dbbrowser/sprint/searchprintss.cgi?display_opts=Prints&category=None&queryform=false&prints_accn=PR01168. 

So in any case you first need to set up some kind of resolution 
mechanism that maps a URI containing both a database name and an 
identifier to a particular location.

I'm still undecided whether these URIs should be URNs (LSIDs) or real 
URLs. If most databases had reasonable URLs (or could be convinced to 
provide such - good luck), I would be in favor of simply generating 
artificial URLs for the rest. But since you anyways have to create URIs 
for virtually every database...


> 8. Example? (in particular, it's unclear formally what
> "inline" means here)

By inline I mean references to resources from within a piece of text.


> 9. Again, example...?

What I refer to as "grouping of statements" has been called "context", 
"graph" or "model" by others. Does this clarify what I mean?


> 10. Clarification on "rapid data entry": manual
> assembly? Support for some specific features..?

What I have in mind is a text editor that supports features available in 
most code editors such as auto-completion, validation, syntax coloring, 
user-defined templates etc. Would probably make use of an N3-like 
syntax. This would (hopefully) allow data to be created and modified 
much faster than with most of the tools I have seen so far.
Received on Friday, 20 August 2004 10:32:52 UTC