- From: Eric Jain <Eric.Jain@isb-sib.ch>
- Date: Fri, 20 Aug 2004 12:32:50 +0200
- To: Massimo Marchiori <massimo@w3.org>
- CC: public-semweb-lifesci@w3.org
Massimo Marchiori wrote: > 1. Let me rephrase what I think you are saying: > writing a parser for the XML serialization syntax > is quite hard. Is this it, or there is more, like > eg user readability? My main concern is not user readability (not a bad thing, of course), but I feel that things would be easier if there weren't several different ways of representing everything. Consider: <rdf:RDF> <rdf:Description rdf:about="#F"> <rdf:type rdf:resource="#Foo"/> </rdf:Description> </rdf:RDF> Versa: <rdf:RDF> <Foo rdf:ID="F"/> </rdf:RDF> The second version is arguably more readable, but can't handle as many cases as the first version, as it restricts the kind of identifiers that can by used. Therefore a parser must support both versions, and this is just one example. The more alternatives, the more complex the programming. And people must remember all the different rules, too. I therefore wonder if it would make sense to have a "Common RDF", following the philosophy of [http://www.simonstl.com/articles/cxmlspec.txt]. > 2. Clarification: do you mean collections/lists > should not be in RDF, or just that the way they > are represented now is not nice/effective? Either way, > could you provide some motivation/example? Being able to explicitly express that the order of several items is relevant doesn't seem like a bad idea. On the other hand, considering the tricks required to map collections/lists to triples, I wonder if it wouldn't be better to leave it up to applications whether or not they treat the ordering of certain items as relevant or not. As an example I mention query engines. Few query engines have direct support for collections/lists [see http://www.aifb.uni-karlsruhe.de/WBS/pha/rdf-query/], so instead of "selecting all publications that have an author x", for example, you may have to "select all publications that have a list that contains an author x". Of course, sometimes you may want to "return the titles and the first author for all publications cited by y", for example. But should this functionality be limited to explicitly defined lists? A further argument: Let's say we have a resource with a property "name". If we one day decide that there may in fact be several names, but don't want to loose the ordering, any applications that were previously looking for a "name" property will be broken, because the property has been moved into a list. > 3. Interesting: do your use cases have absolute needs > for reification, or it's just a convenience? Is > your only use just use case 6 (provenance)? Consider this example: A protein may occur in one or more organisms. We may need to indicate who observed this protein in a specific organism, and cite a relevant publication etc. This information obviously can't be attached to either the protein or the taxon resource. We could create intermediary resources for connecting proteins to taxa, but this seems unnatural and is impractical, because the same procedure would have to be repeated for many other properties. Also, no application should break because one day we decide to provide some provenance data for something that previously never had any. In addition to provenance data (currently only available in our internal version, but likely to be made public at some point) we plan to allow our database curators to attach "post-it notes" in many of the same places (for internal use only). > And, > when you talk about quads, do you mean the RDF model > should be changed to allow context, or did you mean > just that efficiency of reification handling is > still an open issue and so suitable pseudo-forms > could be developed to provide better tool processing? By quads I meant (perhaps misusing the terminology) that when parsing something like <rdf:Description rdf:about="P12345"> <name rdf:ID="S1">Foo</name> </rdf:Description> with a statement-by-statement callback mechanism, most parsers will return: P12345 name 'Foo' S1 rdf:type rdf:Statement S1 rdf:subject P12345 S1 rdf:predicate name S1 rdf:object 'Foo' Rather than: S1: P12345 name 'Foo' Which would be much simpler and more efficient to process, in my opinion. > 4. yes, you're right. No negation, at least for > the moment... (for a reason, as treatment of negation > can be quite hairy...) I remember hearing about this before. Perhaps someone with insight could explain where the hairiness arises? > 5. Could you expand more on this? An example would > shed some light. For many applications it is useful (N3, RDQL, storage in relational database) or even required (Protege) that URIs (or some of them) can be separated into a namespace part and a QName. The following URIs work well: http://uniprot.org/uniprot/P12345 urn:lsid:uniprot.org:uniprot:P12345 But this one doesn't (9606 is not a QName): urn:lsid:uniprot.org:taxonomy:9606 This is an irritation, though I'm not sure whom to blame :-) > 6. So, I guess the critique isn't much to RDF per se > but to Web Services, right? Or, are there specific > features of RDF that you were thinking about? Yes, this is more of a critique of web services in general and the difficulty of creating web services that send and receive RDF in particular. > 7. Is is really that bad to use plain URI (ramping > debate here, but I'm interested in your opinion. > if it's for your point 11, then 11 could be solved > and therefore using URIs... ;) ? The problem is that currently most life science resources are only accessible via URLs such as http://www.bioinf.man.ac.uk/cgi-bin/dbbrowser/sprint/searchprintss.cgi?display_opts=Prints&category=None&queryform=false&prints_accn=PR01168. So in any case you first need to set up some kind of resolution mechanism that maps a URI containing both a database name and an identifier to a particular location. I'm still undecided whether these URIs should be URNs (LSIDs) or real URLs. If most databases had reasonable URLs (or could be convinced to provide such - good luck), I would be in favor of simply generating artificial URLs for the rest. But since you anyways have to create URIs for virtually every database... > 8. Example? (in particular, it's unclear formally what > "inline" means here) By inline I mean references to resources from within a piece of text. > 9. Again, example...? What I refer to as "grouping of statements" has been called "context", "graph" or "model" by others. Does this clarify what I mean? > 10. Clarification on "rapid data entry": manual > assembly? Support for some specific features..? What I have in mind is a text editor that supports features available in most code editors such as auto-completion, validation, syntax coloring, user-defined templates etc. Would probably make use of an N3-like syntax. This would (hopefully) allow data to be created and modified much faster than with most of the tools I have seen so far.
Received on Friday, 20 August 2004 10:32:52 UTC