- From: BASS,MICK (HP-USA,ex1) <mick_bass@hp.com>
- Date: Fri, 19 Jul 2002 01:22:59 -0400
- To: www-rdf-dspace@w3.org
> Attached are meeting notes from the SIMILE PI Meeting and
> Technical Exchange of Thursday 11 July 2002.
Ten seconds after posting, I realized that all the http://www.mit.edu URLs
are incorrect and need to instead be http://web.mit.edu
I've fixed this on the SIMILE website http://web.mit.edu/dspace-dev/simile/
(not www.mit.edu!!) and included the correction here as well.
Sorry 'bout that.
- Mick
_________________________________________________________________
SIMILE PI Meeting and Technical Exchange
2002-07-11
Summary, Actions, and Issues
Editor: Mick Bass, HP
mailto:bass@alum.mit.edu
This document:
http://web.mit.edu/dspace-dev/simile/minutes/minutes-2002-07-11.txt
Series:
http://web.mit.edu/dspace-dev/simile/minutes/index.html
_________________________________________________________________
Table of Contents
Agenda
Attendees
Resources
Summary
New Actions / Issues
Open Actions / Issues
Closed Actions / Issues
_________________________________________________________________
Agenda (as revised)
Welcome, Goals for the Day and for the Summer
W3C Advanced Development
Q&A, Discussion
Jena: brief overview, current & future directions
Haystack: brief overview
Q&A, discussion, brainstorm wrt jena, haystack
(videoconf & netmeeting from Bristol, UK)
Haystack Demo
Lunch
DSpace: architecture, target research platform
Discussion
review proposed methodology
review deliverables in startup phase
working process for remainder of startup phase
use case / demonstrator hone & brainstorm
_________________________________________________________________
Attendees
David Karger (PI) MIT, LCS / AI Lab
David Huynh MIT, LCS / AI Lab
Dennis Quan MIT, LCS / AI Lab
Vineet Sinha MIT, LCS / AI Lab
MacKenzie Smith (Co-I) MIT, Libraries
Eric Miller (Co-I) W3C
Marja-Riitta Koivunen W3C
Mick Bass HP Labs
Robert Tansley HP Labs
John Erickson HP Labs (via video)
Nick Wainwright HP Labs
Rivka Ladin HP Labs
Partial-day attendees
Eytan Adar HP Labs
David Stuve HP Labs
Brian McBride HP Labs (via video)
Dave Reynolds HP Labs (via video)
Andy Seaborne HP Labs (via video)
_________________________________________________________________
Resources
SIMILE Public Page
http://web.mit.edu/dspace-dev/www/simile/
SIMILE Research Proposal (public, version 2002-04)
http://web.mit.edu/dspace-dev/www/simile/resources/proposal-2002-04/index.ht
m
SIMILE Startup Phase Goals and Deliverables (July - October 2002)
http://web.mit.edu/dspace-dev/www/simile/resources/goals-summer-2002.txt
W3C Semantic Web Activity
http://www.w3.org/2001/sw/
W3C Semantic Web Advanced Development Presentation Materials
http://www.w3.org/2002/Talks/07-simile-swad/
Haystack Public Page
http://haystack.lcs.mit.edu/
Haystack Presentation Materials
http://web.mit.edu/dspace-dev/www/simile/resources/Haystack-Overview.pdf
DSpace Architecture Presentation Materials
http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Arch-Overview-2002
-07-11.pdf
DSpace White Papers (Functionality, Architecture)
http://www.dspace.org/live/implementation/design_documents/functionality.pdf
http://www.dspace.org/live/implementation/design_documents/architecture.pdf
Reference Model for an Open Archival Information Systems (OAIS)
http://ssdoo.gsfc.nasa.gov/nost/isoas/
http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf
HP Labs Semantic Web Activity
http://hpl.hp.com/semweb/
Jena Semantic Web Toolkit
http://www.hpl.hp.com/semweb/jena-top.html
Jena Presentation Materials
URL not available
Jena Development Public List
http://groups.yahoo.com/group/jena-dev/
www-rdf-dspace
http://lists.w3.org/Archives/Public/www-rdf-dspace/
mailto://www-rdf-dspace@w3.org
_________________________________________________________________
Summary
(summarizes both the meeting and some subsequent pairwise interactions)
W3C Advanced Development
Eric Miller presented an overview of W3C Semantic Web advanced
development activities. Marja-Riitta Koivunen described current
work on Annotea.
Jena, Haystack, Persistent RDF Stores
Brian McBride, Dave Reynolds, and Andy Seaborne joined as Brian
McBride presented a brief overview of the Jena semantic web
toolkit. David Karger followed with a brief overview of Haystack.
Open discussion regarding potential haystack/jena interactions,
and especially focused on database back-ends.(->A1)
The discussion included:
- a discussion of Cholesterol (current RDF store used in
haystack) and adenine (scripting language that produces
RDF). (->I2, I3)
- Haystack's need to issue many (hundreds per second) small,
simple queries to be able to render its UI
- reification, the haystack "belief server", and overheads
associated with storing reified rdf statements
- event triggers
Dave Reynolds subsequently constructed a simple tree-walking
testcase that demonstrates query throughput in the "hundreds per
second" range on sub-$1K PCs. See Dave's note on www-rdf-dspace.
http://lists.w3.org/Archives/Public/www-rdf-dspace/2002Jul/0002.html
Dennis Quan has also since undertaken the effort to create a
version of Haystack that uses Jena instead of Cholesteral, to shed
light on any hidden requirements. This has uncovered some
additional issues, which Dennis describes in a note on
www-rdf-dspace.
http://lists.w3.org/Archives/Public/www-rdf-dspace/2002Jul/0004.html
(->A4)
DSpace Architecture
Rob Tansley presented an overview of the DSpace Architecture.
Eric Miller pointed out overlap between the DSpace bitstream
format registry and open issues regarding mime-types in the W3C
Technical Architecture Group. (->I5)
Will Handles ever be URIs? They are URI-compliant, but what are
CNRI's plans with respect to registration? (->I6)
Can there be / should there be an RDF Schema for METS? (->I7)
DSpace currently uses Jena to create RDF serializations of
instances in the DSpace data model and descriptions of
relationships among these serializations and events that change
the state of the archive. DSpace does not yet use Jena as a
persistent RDF store. Many opportunities exist to make the
existing RDF information more readily query-able, as well as to
store community- and format-specific metadata in a persistent RDF
store in addition to storing it in serialized bitstreams, as is
currently done.
We revisited the desire and need for the output and progress from
SIMILE to respond to strong demand from MIT Libraries, the library
community, and the effort to federate DSpace at a number of
leading universities. Dave Karger requested a followup session
where MacKenzie could share her perspective on unmet needs in in
the library community (->A8)
We clarified the difference between DSpace RDF History data
(which describes instances of the DSpace data model, and the
temporal relationships between them and key events that change
the state of the archive), and RDF transformations of MARC from
Barton, the libraries catalog.
Dave Karger and his team expressed interest in immediately
beginning to develop some simple Haystack UI components for
exploring and navigating both of these corpuses of RDF metadata.
Eric Miller agreed to assist with transformations
(if any required) on the DSpace RDF instance serializations
and history metadata. (->A9, A10)
Dave Karger envisions 3 possible variants of haystack clients:
- generic. This is haystack as it exists today library client.
- library client. This is a personal haystack extended to
provide richer functionality for interoperating with library
resources. Or a library-user haystack able to interoperate
with personal resources, depending on your perspective :)
- library administrator. This is a haystack optimized for
those that maintain information corpus, and take decisions
about how it can and should be organized and annotated to
provide optimal value to the designated community. (->A10)
Methodology Walkthrough
We reviewed the three phases that we've initially scoped for the
project:
1 + Support submit/query/discover/retrieve of instances of
simple flat (e.g. name-value pairs) community-specific schemas.
+ Support simple (free ascii text, human-readable) annotations
of DSpace items.
+ Integrate Jena, persistent RDF store with DSpace.
+ Add consequent UI support.
2 + Add support for machine-readable annotations to any instance
in the DSpace data model (including annotations to individual
metadata elements). This will enable authority-control
annotation-services layered atop the originally-submitted
metadata.
+ Add support for discovery and incorporation of distributed
metadata stores (some metadata hosted by libary, some
metadata hosted by community or department, some personal
metadata).
+ Add required consequent dissemination/viewing architecture.
3 + Add support for rich schemas (hierarchy, whole/part, etc.)
+ schema mapping & intelligent query response through
aggregation of distributed schema annotations.
A key issue is that the relationship between these phases, and the
proposed additional platform capabilities to be demonstrated in
each, and unmet scenarios for use in the proposed pilot domain
(libraries) is not yet sufficiently well articulated and
vetted. (->I11)
These phases are not yet frozen. The PIs retain the flexibility
to change their definitions, and the relationship of each to
articulated demonstrators and scenarios for use.
We agreed to be explicit about the difference between:
- service-deployment policy decisions taken in a particular
environment, and
- required platform capabilities recognizing that platform
capabilities may need to be be more rich than the specific
(policy-driven) requirements in any one environment, in order to
support the different and various service-deployment decisions
that will be taken elsewhere.
Example: Would libraries host metadata for resources that they do
not host? (In the current DSpace service offered by MIT Libraries,
the answer is "no"). Should the SIMILE platform be capable of
distributed metadata such that an institution can decide to host
metadata for resources that they do not host? (The answer is very
arguably "yes". For instance, much annotation metadata will
relate multiple resources, rather than describe one or the other).
We discovered that the term "annotation" is a victim of semantic
overload. W3C SWAD, and specifically the Annotea team, uses it in
its most general form (a machine-readable statement that describes
a relationship between two resources). In phase two of the
methodology we use it to mean additional, or perhaps corrected,
machine-readable metadata for DSpace items (examples: suggested
subject keywords for a previously-submitted item; a corrected or
otherwise improved author (authority-controlled, perhaps)
annotating the author as originally submitted. In phase one we
mean it in an even simpler and less granular sense: human-readable
ascii prose annotating items submitted in DSpace. This is another
example platform capabilies vs. service deployment choices. We'll
need to architect from the beginning with the most general case in
mind. But we'll want to turn on with progressively more complex
deployed services, starting with the simplest ones.
Deliverables in the Startup Phase
We reviewed the goals and deliverables for the startup phase of the
project.
Nick Wainwright suggested that we search for a way to exit the
startup phase with a simple but solidly grounded demo/example in
addition to the deliverables that we reviewed (project plan, etc.)
(->A12)
We reviewed the current status of contracts with Office of
Sponsored Programs, finance contacts in LCS & Libraries, and the
need to turn around contracts as quickly as possible. (->A13, A14)
Preferred working processes
We discussed our preferences regarding how to work together to
efficiently get the work done. Outcomes included:
- Need an email list for group communication. We decided to use
www-rdf-dspace, an existing list at W3C that we set up during
the DSpace project for discussion about application of RDF
within DSpace. (->A15)
- We agreed that we will need 3-4 face to face working meetings
including all PIs, somewhere between 1/2 day to a full day each.
That's about once every three to four weeks. Eric Miller stated
a preference to do at least one of these in Ohio. David Karger
voiced constraints on ability to travel, and prefers Ohio later
rather than earlier during the startup phase. We tentatively
agreed to August 14th for the next PI face to face, in
Cambridge. Need to confirm date & location. (->A16)
- In addition to the PI f2f meetings, we acknowledged that we'd
likely need ongoing offline pairwise interactions. Results from
these will ideally be written up and submitted to the group via
www-rdf-dspace.
- We agreed to steer on an ongoing basis through a weekly PI phone
call. The purpose of the weekly call is to tee up required
offline interactions, take group decisions, and report on open
issues/actions. We tentatively agreed to Thursdays, 3pm for
this call. That time has since broken, we will need to find
a different time. (->A17, A18)
Several participants requested that we consolidate materials and
resources from the meeting and make them available to the
group. (->A19)
Hone & Brainstorm Demonstrators and Scenarios for Use
We didn't make significant progress here, its top-of-the list
for future work.
_________________________________________________________________
NEW Actions (A) / Issues (I)
ref owner summary
----|--------|---------
A1 karger summarize test case / requirements on Jena for use in
haystack
I2 karger should cholesterol become an alternative RDB implementation
for Jena?
I3 karger could/should Adenine be bundled with Jena?
A4 bass, followup Quan's "Jena in Haystack" note with Dennis
karger and Jena team.
I5 kenzie, relationship DSpace Format Registry :: Mime Type open issues
miller in W3C Technical Architecture Group
- http://www.w3.org/2001/tag/ilist#w3cMediaType-1
- http://www.w3.org/2001/tag/ilist#customMediaType-2
- http://www.w3.org/2001/tag/ilist#nsMediaType-3
- http://www.w3.org/2001/tag/ilist#uriMediaType-9
I6 erickson what is CNRI position regarding registration of handle
system
as a URI scheme with IANA?
I7 kenzie, relationship METS :: RDF
miller
A8 bass PI forum for Kenzie to present and all to discuss unmet
metadata
and information interoperability needs from MIT Libraries
perspective
A9 miller, gather & distribute RDF corpuses for (1) DSpace History and
kenzie, (2) Barton Library Catalog
bass
A10 karger, haystack team pairwise w/ kenzie & library staff to scope UI
reqs for
kenzie library catalogers / "semantic annotators"
I11 all, Strengthen, capture, review: Driving/focusing scenarios for
bass use, and relationship to methodology phases.
A12 all, Brainstorm, scope, (within available resources), and plan to
deliver a simple yet compelling demonstrator by end of
startup phase.
A13 all follow through with requested paperwork to OSP ASAP after
receiving it. We need contracts to close by end of July
if at all possible.
A14 bass followup with Lissa Natkin, LCS
A15 bass send www-rdf-dspace subscribe instructions to all PIs
A16 bass confirm August 14 Cambridge with all PIs
A17 all provide non-available times to bass for weekly PI phone call
A18 bass identify time for weekly PI phone call
A19 bass consolidate and make available meeting presentation
materials
and other useful resources
_________________________________________________________________
OPEN Actions (A) / Issues (I)
ref owner summary
--------
progress / status
----|--------|---------
A1 karger summarize test case / requirements on Jena for use in
haystack
I2 karger should cholesterol become an alternative RDB implementation
for Jena?
I3 karger could/should Adenine be bundled with Jena?
A4 bass, followup Quan's "Jena in Haystack" note with Dennis
karger and Jena team
--------
sent to Jena team
I5 kenzie, relationship DSpace Format Registry :: Mime Type open issues
miller in W3C Technical Architecture Group
- http://www.w3.org/2001/tag/ilist#w3cMediaType-1
- http://www.w3.org/2001/tag/ilist#customMediaType-2
- http://www.w3.org/2001/tag/ilist#nsMediaType-3
- http://www.w3.org/2001/tag/ilist#uriMediaType-9
I6 erickson what is CNRI position regarding registration of handle
system
as a URI scheme with IANA?
I7 kenzie, relationship METS :: RDF
miller
A8 bass PI forum for Kenzie to present and all to discuss unmet
metadata
and information interoperability needs from MIT Libraries
perspective
A9 miller, gather & distribute to PIs RDF corpuses for
kenzie, (1) DSpace History and (2) Barton Library Catalog
bass
--------
tar of RDF history bitstreams in hand.
need bass/kenzie policy conversation.
A10 karger, haystack team pairwise w/ kenzie & library staff to scope UI
reqs for
kenzie library catalogers / "semantic annotators"
I11 all, Strengthen, capture, review: driving/focusing scenarios for
bass use, and relationship to methodology phases.
A12 all, Brainstorm, scope, (within available resources), and plan to
deliver a simple yet compelling demonstrator by end of
startup phase.
A13 all follow through with requested paperwork to OSP ASAP after
receiving it. We need contracts to close by end of July
if at all possible.
A16 bass confirm August 14 Cambridge with all PIs
A17 all provide non-available times to bass for weekly PI phone call
A18 bass identify time for weekly PI phone call
_________________________________________________________________
CLOSED Actions (A) / Issues (I)
ref owner summary
--------
resolution
----|--------|---------
A14 bass followup with Lissa Natkin, LCS
--------
met 2002-07-17, communication established and remains open.
A15 bass send www-rdf-dspace subscribe instructions to all PIs
--------
sent.
A19 bass consolidate and make available meeting presentation
materials
and other useful resources
--------
here they are! also set up SIMILE project web page.
_________________________________________________________________
Received on Friday, 19 July 2002 01:23:01 UTC