- From: Eric Jain <Eric.Jain@isb-sib.ch>
- Date: Fri, 20 Aug 2004 12:32:50 +0200
- To: Massimo Marchiori <massimo@w3.org>
- CC: public-semweb-lifesci@w3.org
Massimo Marchiori wrote:
> 1. Let me rephrase what I think you are saying:
> writing a parser for the XML serialization syntax
> is quite hard. Is this it, or there is more, like
> eg user readability?
My main concern is not user readability (not a bad thing, of course),
but I feel that things would be easier if there weren't several
different ways of representing everything.
Consider:
<rdf:RDF>
<rdf:Description rdf:about="#F">
<rdf:type rdf:resource="#Foo"/>
</rdf:Description>
</rdf:RDF>
Versa:
<rdf:RDF>
<Foo rdf:ID="F"/>
</rdf:RDF>
The second version is arguably more readable, but can't handle as many
cases as the first version, as it restricts the kind of identifiers that
can by used. Therefore a parser must support both versions, and this is
just one example. The more alternatives, the more complex the
programming. And people must remember all the different rules, too. I
therefore wonder if it would make sense to have a "Common RDF",
following the philosophy of [http://www.simonstl.com/articles/cxmlspec.txt].
> 2. Clarification: do you mean collections/lists
> should not be in RDF, or just that the way they
> are represented now is not nice/effective? Either way,
> could you provide some motivation/example?
Being able to explicitly express that the order of several items is
relevant doesn't seem like a bad idea. On the other hand, considering
the tricks required to map collections/lists to triples, I wonder if it
wouldn't be better to leave it up to applications whether or not they
treat the ordering of certain items as relevant or not.
As an example I mention query engines. Few query engines have direct
support for collections/lists [see
http://www.aifb.uni-karlsruhe.de/WBS/pha/rdf-query/], so instead of
"selecting all publications that have an author x", for example, you may
have to "select all publications that have a list that contains an
author x". Of course, sometimes you may want to "return the titles and
the first author for all publications cited by y", for example. But
should this functionality be limited to explicitly defined lists?
A further argument: Let's say we have a resource with a property "name".
If we one day decide that there may in fact be several names, but don't
want to loose the ordering, any applications that were previously
looking for a "name" property will be broken, because the property has
been moved into a list.
> 3. Interesting: do your use cases have absolute needs
> for reification, or it's just a convenience? Is
> your only use just use case 6 (provenance)?
Consider this example: A protein may occur in one or more organisms. We
may need to indicate who observed this protein in a specific organism,
and cite a relevant publication etc. This information obviously can't be
attached to either the protein or the taxon resource. We could create
intermediary resources for connecting proteins to taxa, but this seems
unnatural and is impractical, because the same procedure would have to
be repeated for many other properties. Also, no application should break
because one day we decide to provide some provenance data for something
that previously never had any.
In addition to provenance data (currently only available in our internal
version, but likely to be made public at some point) we plan to allow
our database curators to attach "post-it notes" in many of the same
places (for internal use only).
> And,
> when you talk about quads, do you mean the RDF model
> should be changed to allow context, or did you mean
> just that efficiency of reification handling is
> still an open issue and so suitable pseudo-forms
> could be developed to provide better tool processing?
By quads I meant (perhaps misusing the terminology) that when parsing
something like
<rdf:Description rdf:about="P12345">
<name rdf:ID="S1">Foo</name>
</rdf:Description>
with a statement-by-statement callback mechanism, most parsers will return:
P12345 name 'Foo'
S1 rdf:type rdf:Statement
S1 rdf:subject P12345
S1 rdf:predicate name
S1 rdf:object 'Foo'
Rather than:
S1: P12345 name 'Foo'
Which would be much simpler and more efficient to process, in my opinion.
> 4. yes, you're right. No negation, at least for
> the moment... (for a reason, as treatment of negation
> can be quite hairy...)
I remember hearing about this before. Perhaps someone with insight could
explain where the hairiness arises?
> 5. Could you expand more on this? An example would
> shed some light.
For many applications it is useful (N3, RDQL, storage in relational
database) or even required (Protege) that URIs (or some of them) can be
separated into a namespace part and a QName.
The following URIs work well:
http://uniprot.org/uniprot/P12345
urn:lsid:uniprot.org:uniprot:P12345
But this one doesn't (9606 is not a QName):
urn:lsid:uniprot.org:taxonomy:9606
This is an irritation, though I'm not sure whom to blame :-)
> 6. So, I guess the critique isn't much to RDF per se
> but to Web Services, right? Or, are there specific
> features of RDF that you were thinking about?
Yes, this is more of a critique of web services in general and the
difficulty of creating web services that send and receive RDF in particular.
> 7. Is is really that bad to use plain URI (ramping
> debate here, but I'm interested in your opinion.
> if it's for your point 11, then 11 could be solved
> and therefore using URIs... ;) ?
The problem is that currently most life science resources are only
accessible via URLs such as
http://www.bioinf.man.ac.uk/cgi-bin/dbbrowser/sprint/searchprintss.cgi?display_opts=Prints&category=None&queryform=false&prints_accn=PR01168.
So in any case you first need to set up some kind of resolution
mechanism that maps a URI containing both a database name and an
identifier to a particular location.
I'm still undecided whether these URIs should be URNs (LSIDs) or real
URLs. If most databases had reasonable URLs (or could be convinced to
provide such - good luck), I would be in favor of simply generating
artificial URLs for the rest. But since you anyways have to create URIs
for virtually every database...
> 8. Example? (in particular, it's unclear formally what
> "inline" means here)
By inline I mean references to resources from within a piece of text.
> 9. Again, example...?
What I refer to as "grouping of statements" has been called "context",
"graph" or "model" by others. Does this clarify what I mean?
> 10. Clarification on "rapid data entry": manual
> assembly? Support for some specific features..?
What I have in mind is a text editor that supports features available in
most code editors such as auto-completion, validation, syntax coloring,
user-defined templates etc. Would probably make use of an N3-like
syntax. This would (hopefully) allow data to be created and modified
much faster than with most of the tools I have seen so far.
Received on Friday, 20 August 2004 10:32:52 UTC