Aligning DoCO and the middle grained document structure

From: Jodi Schneider <jodi.schneider@deri.org>
Date: Wed, 24 Nov 2010
Cc: David Shotton <david.shotton@zoo.ox.ac.uk>, Paolo Ciccarese <paolo.ciccarese@gmail.com>, "M. Scott Marshall" <mscottmarshall@gmail.com>, Tim Clark <tim_clark@harvard.edu>, John Madden <john.madden@duke.edu>, Alberto Accomazzi <aaccomazzi@cfa.harvard.edu>, Sophia Ananiadou <Sophia.Ananiadou@manchester.ac.uk>, Gully Burns <gully@usc.edu>, "Ronald (ELS-SDG) Daniel" <R.Daniel@elsevier.com>, Rahul Dave <rahuldave@gmail.com>, Anita de Waard <A.dewaard@elsevier.com>, Alf Eaton <A.Eaton@nature.com>, Alyssa Goodman <agoodman@cfa.harvard.edu>, Paul Groth <pgroth@gmail.com>, Tudor Groza <tudor.groza@deri.org>, ellen hays <E.Hays@elsevier.com>, "Antony (ELS-CAM) Scerri" <A.scerri@elsevier.com>, Jack Park <jackpark@gmail.com>, Silvio Peroni <speroni@cs.unibo.it>, Philippe Rocca-Serra <proccaserra@googlemail.com>, Karin Verspoor <Karin.Verspoor@ucdenver.edu>, Lynette Hirschman <lynette@mitre.org>, Susanna-Assunta Sansone <sansone@ebi.ac.uk>, Jun Zhao <jun.zhao@zoo.ox.ac.uk>, "Joanne Luciano (gmail)" <jluciano@gmail.com>, Alexander Garcia Castro <alexgarciac@gmail.com>
Message-Id: <856DE758-55CA-43C1-B55C-0B8EBACD431C@deri.org>
To: HCLS IG
Here's what we discussed in our call yesterday. Overall we're looking to discuss and align DoCO, ORB, DRO, and the Middle-grained document structure, in the context of life sciences research papers.

We plan to talk again on Tuesday 7th Dec at 10 EST / 3 PM GMT (phone number TBA). If you're interested, could you please let me know (off-list, jodi.schneider@deri.org)? We may be able to adjust the time in the future.


Yesterday Anita, Alex Garcia and I discussed the possibilities for alignment between DoCO [1] and Medium-grained document structure [2]. DoCO is currently being developed as part of SPAR [3].

Our general conclusion was that David Shotton's proposal (PDF attached) was on target. However, we want to:
(1) use existing ontologies for references (BIBO? ...?)
(2) use existing ontologies for the header (PRISM? DC?...?) 
(3) check the use of fabio: (e.g. for Experimental Protocol)
(4) check the use dro:
(5) check the use of sro:

A few questions regarding DoCO came up. The combination of document components, rhetorical components, rhetorical blocks, and structural patterns confused us. We expected these to be several smaller ontologies. Another question (David, perhaps you can answer this) is why you prefer imports into a larger ontology, as opposed to building an application profile? Rhetorical components, for instance, may already be handled adequately by SALT, SWAN, and ScholOnto.

We also discussed whether we wanted to get beyond ontologies to also address authoring and/or textmining (with a schema or DTDs drawing from ontologies). Anita pointed out that in our 3 use cases, 1 involves authoring. Further, for authoring we're limited to continguous sections (as opposed to post-hoc rhetorical component detection).

[1] http://purl.org/spar/doco/
[2] http://esw.w3.org/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/models/medium
[3] http://opencitations.wordpress.com/2010/10/14/introducing-the-semantic-publishing-and-referencing-spar-ontologies/
