W3C home > Mailing lists > Public > public-xg-lld@w3.org > May 2011

Re: Medline RDF -- Citations

From: Ben O'Steen <bosteen@gmail.com>
Date: Sun, 8 May 2011 11:39:38 +0100
Message-ID: <BANLkTimf13iwr1pmv1-h7H5Hap3vaBXTLg@mail.gmail.com>
To: William Waites <ww@styx.org>
Cc: public-xg-lld <public-xg-lld@w3.org>, Peter Murray-Rust <pm286@cam.ac.uk>, david.shotton@zoo.ox.ac.uk, mark@odaesa.com
Will,

Just a quick note to say that the agreement we got for the data was in
context of basic bibliographic information (as specified by the open
bibliographic principles) - getting an agreement that we can share the
citation information as well as cc0 is an ongoing conversation
unfortunately.

Ben
On 8 May 2011 11:33, "William Waites" <ww@styx.org> wrote:
> Hello David and fellow LLD XG members,
>
> I'm copying you now because this is straying into your citation work.
> So I've been improving the Medline RDF data that we now have, and have
> seen that there is some quite rich citation information in there. So
> my first idea was to try to use CITO to represent it. CITO sounds like
> BIBO and I like BIBO so I wanted to use them together. And immediately
> I ran into a problem that, unless I have thoroughly misunderstood
> something, which is quite possible, is quite deep.
>
> I've also copied in the LLD XG not because it's relevant and because I
> suspect that there is a lot of knowledge there that can help model
> this properly.
>
> Take, for example, this XML fragment,
>
> <CommentsCorrections RefType="Cites">
> <RefSource>Psychosom Med. 2008 Jun;70(5):539-45</RefSource>
> <PMID>18519880</PMID>
> </CommentsCorrections>
>
> I can easily turn this into,
>
> pubmed:foo cito:cites pubmed:18519880
>
> so far so good, but already I notice that I have no place to hang the
> text of the RefSource.
>
> So take this next one,
>
> <CommentsCorrections RefType="ErratumIn">
> <RefSource>J Infect Dis 1998 Aug;178(2):601</RefSource>
> <Note>Whitely RJ [corrected to Whitley RJ]</Note>
> </CommentsCorrections>
>
> Here we have a kind of citation, although maybe it is stretching it to
> call an erratum a citation, maybe not, but firstly we have no
> predicate in cito to express this, secondly there is no obvious place
> to hang the text of the source (could and probably would use a blank
> node) but most importantly, the Note, which can appear in any
> citation/comment, also we have no place to stuff that.
>
> I could give more examples, but what I'm getting at is that modelling
> citations as predicates is problematic because we actually have an
> infinite variety of citations with different shades of meaning, and we
> don't want that to mean an infinite variety of subtly different
> predicates (theoretically it is possible and coherent to do this but
> practically it is not). When this happens it generally means that one
> wants to move the modelling to classes.
>
> So I might write something like,
>
> [
> a cito:Citation;
> cito:citedBy pubmed:foo;
> cito:cites pubmed:18519880;
> dc:bibliographicCitation "Psychosom Med. 2008 Jun;70(5):539-45"
> dc:description "some notes about the citation"
> ].
>
> Doing it this way means that you can refine the citation in a way that
> has formal semantics by refining the rdf type, and you can refine it
> informally by adding other descriptive statements to the citation
> instance.
>
> One problem is that there is an equivocation on what a citation is.
> SWAN, which CITO is based upon, thinks that a citation is a uniquely
> identifiable reference to a work/book/whatever, it actually says,
>
> Information which fully identifies a publication. A complete
> citation usually includes author, titl e, name of journal (if the
> citation is to an article) or publisher (if to a book), and
> date. Often pages, volumes and other information will be included in a
> citation."
>
> Well no, that's not really right. A citation is a *reference* to a
> publication, which obviously is best done with some kind of
> information to identify it, maybe even a URI but it is not that
> information. The nature of the reference is part of the citation but
> has nothing to do with the identifying information. The identifying
> information is a URI or description and we already have that. The
> citation is about the relationship between the things
> described. (Sorry for being repetitive here, difficult to express
> clearly).
>
> So that's a modelling problem in SWAN, I think. CITO inherits it
> because it derives terms from it. Now SWAN is not a lightweight
> vocabulary. It's a heavyweight OWL beast. Which means that I'm not
> even sure if I can mix CITO and BIBO without entailing
> contradictions...
>
> What to do? In the near term we can release this large dataset sans
> citations but that would actually be quite a shame... Also have to
> check this with the copyright people - is the fact of citing something
> that pubmed would claim to own? But that one's a distraction for most
> of the people on this list...
>
> Cheers,
> -w
> --
> William Waites <mailto:ww@styx.org>
> http://river.styx.org/ww/ <sip:ww@styx.org>
> F4B3 39BF E775 CF42 0BAB 3DF0 BE40 A6DF B06F FD45
Received on Sunday, 8 May 2011 10:40:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 8 May 2011 10:40:08 GMT