- From: Ben O'Steen <bosteen@gmail.com>
- Date: Sun, 8 May 2011 11:39:38 +0100
- To: William Waites <ww@styx.org>
- Cc: public-xg-lld <public-xg-lld@w3.org>, Peter Murray-Rust <pm286@cam.ac.uk>, david.shotton@zoo.ox.ac.uk, mark@odaesa.com
- Message-ID: <BANLkTimf13iwr1pmv1-h7H5Hap3vaBXTLg@mail.gmail.com>
Will, Just a quick note to say that the agreement we got for the data was in context of basic bibliographic information (as specified by the open bibliographic principles) - getting an agreement that we can share the citation information as well as cc0 is an ongoing conversation unfortunately. Ben On 8 May 2011 11:33, "William Waites" <ww@styx.org> wrote: > Hello David and fellow LLD XG members, > > I'm copying you now because this is straying into your citation work. > So I've been improving the Medline RDF data that we now have, and have > seen that there is some quite rich citation information in there. So > my first idea was to try to use CITO to represent it. CITO sounds like > BIBO and I like BIBO so I wanted to use them together. And immediately > I ran into a problem that, unless I have thoroughly misunderstood > something, which is quite possible, is quite deep. > > I've also copied in the LLD XG not because it's relevant and because I > suspect that there is a lot of knowledge there that can help model > this properly. > > Take, for example, this XML fragment, > > <CommentsCorrections RefType="Cites"> > <RefSource>Psychosom Med. 2008 Jun;70(5):539-45</RefSource> > <PMID>18519880</PMID> > </CommentsCorrections> > > I can easily turn this into, > > pubmed:foo cito:cites pubmed:18519880 > > so far so good, but already I notice that I have no place to hang the > text of the RefSource. > > So take this next one, > > <CommentsCorrections RefType="ErratumIn"> > <RefSource>J Infect Dis 1998 Aug;178(2):601</RefSource> > <Note>Whitely RJ [corrected to Whitley RJ]</Note> > </CommentsCorrections> > > Here we have a kind of citation, although maybe it is stretching it to > call an erratum a citation, maybe not, but firstly we have no > predicate in cito to express this, secondly there is no obvious place > to hang the text of the source (could and probably would use a blank > node) but most importantly, the Note, which can appear in any > citation/comment, also we have no place to stuff that. > > I could give more examples, but what I'm getting at is that modelling > citations as predicates is problematic because we actually have an > infinite variety of citations with different shades of meaning, and we > don't want that to mean an infinite variety of subtly different > predicates (theoretically it is possible and coherent to do this but > practically it is not). When this happens it generally means that one > wants to move the modelling to classes. > > So I might write something like, > > [ > a cito:Citation; > cito:citedBy pubmed:foo; > cito:cites pubmed:18519880; > dc:bibliographicCitation "Psychosom Med. 2008 Jun;70(5):539-45" > dc:description "some notes about the citation" > ]. > > Doing it this way means that you can refine the citation in a way that > has formal semantics by refining the rdf type, and you can refine it > informally by adding other descriptive statements to the citation > instance. > > One problem is that there is an equivocation on what a citation is. > SWAN, which CITO is based upon, thinks that a citation is a uniquely > identifiable reference to a work/book/whatever, it actually says, > > Information which fully identifies a publication. A complete > citation usually includes author, titl e, name of journal (if the > citation is to an article) or publisher (if to a book), and > date. Often pages, volumes and other information will be included in a > citation." > > Well no, that's not really right. A citation is a *reference* to a > publication, which obviously is best done with some kind of > information to identify it, maybe even a URI but it is not that > information. The nature of the reference is part of the citation but > has nothing to do with the identifying information. The identifying > information is a URI or description and we already have that. The > citation is about the relationship between the things > described. (Sorry for being repetitive here, difficult to express > clearly). > > So that's a modelling problem in SWAN, I think. CITO inherits it > because it derives terms from it. Now SWAN is not a lightweight > vocabulary. It's a heavyweight OWL beast. Which means that I'm not > even sure if I can mix CITO and BIBO without entailing > contradictions... > > What to do? In the near term we can release this large dataset sans > citations but that would actually be quite a shame... Also have to > check this with the copyright people - is the fact of citing something > that pubmed would claim to own? But that one's a distraction for most > of the people on this list... > > Cheers, > -w > -- > William Waites <mailto:ww@styx.org> > http://river.styx.org/ww/ <sip:ww@styx.org> > F4B3 39BF E775 CF42 0BAB 3DF0 BE40 A6DF B06F FD45
Received on Sunday, 8 May 2011 10:40:08 UTC