- From: William Waites <ww@styx.org>
- Date: Sun, 8 May 2011 12:33:11 +0200
- To: david.shotton@zoo.ox.ac.uk, public-xg-lld <public-xg-lld@w3.org>
- Cc: Peter Murray-Rust <pm286@cam.ac.uk>, Ben O'Steen <bosteen@gmail.com>, mark@odaesa.com
Hello David and fellow LLD XG members, I'm copying you now because this is straying into your citation work. So I've been improving the Medline RDF data that we now have, and have seen that there is some quite rich citation information in there. So my first idea was to try to use CITO to represent it. CITO sounds like BIBO and I like BIBO so I wanted to use them together. And immediately I ran into a problem that, unless I have thoroughly misunderstood something, which is quite possible, is quite deep. I've also copied in the LLD XG not because it's relevant and because I suspect that there is a lot of knowledge there that can help model this properly. Take, for example, this XML fragment, <CommentsCorrections RefType="Cites"> <RefSource>Psychosom Med. 2008 Jun;70(5):539-45</RefSource> <PMID>18519880</PMID> </CommentsCorrections> I can easily turn this into, pubmed:foo cito:cites pubmed:18519880 so far so good, but already I notice that I have no place to hang the text of the RefSource. So take this next one, <CommentsCorrections RefType="ErratumIn"> <RefSource>J Infect Dis 1998 Aug;178(2):601</RefSource> <Note>Whitely RJ [corrected to Whitley RJ]</Note> </CommentsCorrections> Here we have a kind of citation, although maybe it is stretching it to call an erratum a citation, maybe not, but firstly we have no predicate in cito to express this, secondly there is no obvious place to hang the text of the source (could and probably would use a blank node) but most importantly, the Note, which can appear in any citation/comment, also we have no place to stuff that. I could give more examples, but what I'm getting at is that modelling citations as predicates is problematic because we actually have an infinite variety of citations with different shades of meaning, and we don't want that to mean an infinite variety of subtly different predicates (theoretically it is possible and coherent to do this but practically it is not). When this happens it generally means that one wants to move the modelling to classes. So I might write something like, [ a cito:Citation; cito:citedBy pubmed:foo; cito:cites pubmed:18519880; dc:bibliographicCitation "Psychosom Med. 2008 Jun;70(5):539-45" dc:description "some notes about the citation" ]. Doing it this way means that you can refine the citation in a way that has formal semantics by refining the rdf type, and you can refine it informally by adding other descriptive statements to the citation instance. One problem is that there is an equivocation on what a citation is. SWAN, which CITO is based upon, thinks that a citation is a uniquely identifiable reference to a work/book/whatever, it actually says, Information which fully identifies a publication. A complete citation usually includes author, titl e, name of journal (if the citation is to an article) or publisher (if to a book), and date. Often pages, volumes and other information will be included in a citation." Well no, that's not really right. A citation is a *reference* to a publication, which obviously is best done with some kind of information to identify it, maybe even a URI but it is not that information. The nature of the reference is part of the citation but has nothing to do with the identifying information. The identifying information is a URI or description and we already have that. The citation is about the relationship between the things described. (Sorry for being repetitive here, difficult to express clearly). So that's a modelling problem in SWAN, I think. CITO inherits it because it derives terms from it. Now SWAN is not a lightweight vocabulary. It's a heavyweight OWL beast. Which means that I'm not even sure if I can mix CITO and BIBO without entailing contradictions... What to do? In the near term we can release this large dataset sans citations but that would actually be quite a shame... Also have to check this with the copyright people - is the fact of citing something that pubmed would claim to own? But that one's a distraction for most of the people on this list... Cheers, -w -- William Waites <mailto:ww@styx.org> http://river.styx.org/ww/ <sip:ww@styx.org> F4B3 39BF E775 CF42 0BAB 3DF0 BE40 A6DF B06F FD45
Received on Sunday, 8 May 2011 10:33:35 UTC