- From: BASS,MICK (HP-USA,ex1) <mick_bass@hp.com>
- Date: Thu, 22 Aug 2002 15:35:24 -0400
- To: www-rdf-dspace@w3.org
Attached are minutes from the SIMILE PI Meeting held on 2002-08-14. These
are also available at http://web.mit.edu/dspace-dev/www/simile/.
Please let me know if I've misrepresented anything, and review open
actions!!
Enjoy,
- Mick
_________________________________________________________________
SIMILE PI Face to Face Meeting
2002-08-14
Summary, Actions, and Issues
Editor: Mick Bass, HP
mailto:bass@alum.mit.edu
This document:
http://web.mit.edu/dspace-dev/www/simile/minutes/minutes-2002-08-14.txt
Series:
http://web.mit.edu/dspace-dev/www/simile/minutes/index.html
_________________________________________________________________
Table of Contents
Agenda
Attendees
New Resources
Summary
New Actions / Issues
Open Actions / Issues
Closed Actions / Issues
Resource List
_________________________________________________________________
Agenda
0. Review Agenda
1. brief updates: funding / IPR frameworks
2. use case / demonstrator brainstorm & hone
3. capture tech platform ramifications
4. tune methodology
_________________________________________________________________
Attendees (all in person)
David Karger (PI) MIT, LCS / AI Lab
Eric Miller (Co-I) W3C
Mick Bass (HP PI) HP Labs
MacKenzie Smith (Co-I) MIT, Libraries
_________________________________________________________________
New Resources
The following resources were brought to the attention of the group
during the course of the meeting. The complete resource list
(including those listed here) can be found at the end of this
document.
SCORM - Sharable Content Object Reference Model
http://www.adlnet.org
Open Archives Initiative - protocol for metadata harvesting and sharing
http://www.openarchives.org/
METS - Metadata Encoding and Transmission Standard
http://www.loc.gov/standards/mets/
RUDOLF - ideas and demos for wrapping various services
(e.g. mailing lists, search engine APIs) to expose results or
contents as RDF. n.b. the handle system wrapper. em also
mentioned (plans for?) a google wrapper.
http://www.ilrt.bris.ac.uk/discovery/rdf-dev/rudolf/
_________________________________________________________________
Summary
--
The presentation materials used during the meeting are at:
<http://web.mit.edu/dspace-dev/www/simile/resources/PI-meeting-2002-08-14.pd
f>
--
Brief updates: funding and IPR framework
- HP has commited to funding SIMILE (both contracts, startup phase
and remainder). bass to proceed with initiation of remaining
SIMILE contract with OSP, karger.
action -> A22
- MIT meetings still underway to understand existing encumbrances
on haystack IP. Conclusions expected within days.
--
We reviewed and discussed the relationship between Simile and DSpace
(See A20). We distinguished between Simile (this research project)
and DSpace (historic development, ongoing incremental improvement and
support, and deployment at academic research institutions). The
diagram at:
<http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Simile-Ecosystem.
pdf>
represents these distinct, yet related activities. It is the explicit
and shared intent of both Simile and DSpace to (1) leverage the
current DSpace technology platform as a starting point for the Simile
research methodology, and to (2) share appropriate and useful Simile
research results through the technology deployment channel offered by
the DSpace federation. This can happen by ongoing participation of
Simile PIs on the governing coalition of the DSpace federation, and an
ongoing technical dialogue between Simile researchers and DSpace
developers who are prioritizing and implementing incremental
enhancements to the production DSpace system. MIT Libraries will take
a lead role in verifying with selected collaborators from among the
federating DSpace institutions that potential services culled from
Simile are indeed useful, widely deployable, and production-ready.
We noted the required action to bundle this diagram with a single page of
explanatory text (developed from above statements) that can be
distributed as a standalone document explaining the benefits of the
DSpace/Simile relationship to potential DSpace adopters.
update action -> A20
--
We reviewed the proposed research methodology with respect to
technology platform objectives. PIs voiced consensus on
the following points:
- The desired methodology has the following attributes:
1. it clearly distinguishes between requirements on the
technology platform architecture and requirements
driven by application-specific or institution-specific
policy decisions.
That is, we wish to develop a substrate for flexible
deployment of services on heterogeneous information
objects. We wish the substrate to be supportive of
multiple potential policy decisions about what types
of information objects and/or services will be
deployed in any particular environment. And we wish
the substrate to be a layer on top of the internet
architecture, and demonstrative of semantic web
techniques.
For example, MIT may decide that (for the time being) they
will disallow submissions of items that contain only
metadata. But this does not mean that the technology
platform need not support items containing only metadata,
because an alternative policy decision (by either MIT or
another institution) that would require this support from
the technology platform is emminently foreseeable.
2. It defines with the end in mind, and implements
incrementally.
That is, the methodology should takes a long-term
view on platform architecture requirements and
consequent capabilities, yet choose use cases that
allow intermediate results and demonstrations to
be acheived with bounded engineering effort.
Put another way, we shouldn't have to build everything in
order to be able to do anything. But what we do needs to
be consistent with where we're heading.
This is the motivation of the phases in the research
methodology. Each phase defines key incremental
platform capabilities that are consistent with -
and steps along the way towards - implementation of
a desired technology platform architecture.
- an imperative early research objective is to define and publish
an RDF-based data model and schema for DSpace instances (to parallel
the existing RDBMS table-based data model), and to make the
resultant schema and instances available for research use. This
store of RDF should also include the existing RDF data currently
produced by the DSpace history subsystem.
It is also possible that some appropriate (as defined by MIT
Libraries) subset of this data could be made publicly available.
Such availability would enable additional services to be created
independently from the DSpace application.
- establishing an "RDF server" would be a good way of making such a
corpus of RDF available. Andy Seaborne's Joseki server is a good
candidate - it makes RDF available via http (or https) gets.
DSpace could be augmented to optionally deposit RDF History data
to a bundled Joseki server in addition to simply writing it to
the file system.
- These two key platform capabilities would pave the way for:
- introduction of community-specific schemas as DSpace items
(a "schemas" collection)
- association of DSpace collections with schemas, thus
defining instance metadata to be gathered at item
ingestion
- addition of community-specific metadata subsequent to item
ingestion
- ability to disseminate associated instance metadata along
with discovered DSpace items
- indexing services based upon available community-specific
instance metadata
Dave Karger discussed techniques currently used in haystack which
define ontologies for describing how a particular set of content
should be presented/viewed. In particular, community-specific schemas
might be annotated using a UI-hint-providing schema. Then a UI which
could parse the schema and understandt the hints could guide data
ingestion and/or display appropriately.
A slightly updated methodology overview based upon this discussion is
available at;
<http://web.mit.edu/dspace-dev/www/simile/resources/methodology-overview-200
2-08-14.txt>
We noted the need to expand this methodology to incorporate more
detail from our phase-by-phase discussion of required tasks
and resulting capabilities.
action -> A22
--
Externally Visible Use Cases / Services Brainstorm:
We brainstormed externally visible services and use cases that could
be constructed from the capabilities introduced during each phase of
the methodology.
from more simple to more complex, by phase. We mostly focused on the
earlier phases...
Phase 1:
--------
- published schema for DSpace data model
- RDF store exposing schema and instances
- add DSpace history info to RDF store
Service: Browse temporal and event history of a DSpace item
Service: Retrieve a machine-interpretable graph including reference to
all past serializations of a DSpace item, and their temporal and
event relationships
Service: Retrieve a DSpace item as it existed on specified date
Service: query history information. For example "Retrieve DSpace
items whose keywords have been edited during the past year", or
"Retrieve DSpace items to which PDF files were added after initial
submission"
Service: notification service via persistent periodic RDF query, email push
Service: DSpace system deposit of useful tracking info to RDF store, e.g.
query streams, items accessed, etc.
- submit simple (flat name-value pairs, no hierarchy)
community-specific schemas to DSpace
- implement simple schema-driven submission UI
- implement simple schema-driven dissemination UI
Service: "Schema Registration" for community-specific schemas, so they can
be discovered by other communities.
Service: "Schema Browsing" within the DSpace "Community Schemas" collection
Service: community-specific metadata submission/retrieval
For example:
OCW, needs SCORM, alternative contexts
Art & Architecture Images, needs VRA-core
Sorger / Biomedical Images, needs externally hosted content
and community-specific metadata
- implement schema-driven query UI, bind to RDF Query Capablity
Service: community-specific query, for example "Find all DSpace Items that
have VRA-Core instance metadata and where Media is Pastel"
Service: extract and add citation metadata to DSpace items (ala citeseer)
Service: "What's related?" Library administrators add multiple
"What's related" URIs and caption text to DSpace items, using an
annotea-like schema. This could be used to contextualize results
in terms of related materials, disciplines, collections,
interests, etc.
"See also" annotation service...
Service: "Submit MIT Faculty Comments on this item",
"Read MIT Faculty Comments on this item"
- implement cross-corpus OLAP-like extraction (e.g. histogram generation)
Service: monitor high-usage metadata instances for cleanup prioritization.
Service: "hot items" reported via prioritization heuristics:
- multiply cataloged (multiple schemas)
- frequency of use
- co-citation
Service: characterization of saved search trails
Phase 2:
--------
Authority control / type validation on ingest.
OCLC metadata augmentation (authority control, classification)... at
submission or at a later time
Schemas for describing how items should be viewed, annotations of items in
said schema, and a UI that is respectful of such annotations...
Copy Cataloging - workflow and data integration from several sources?
Tools to support lower-cost cataloging and metadata creation...
Customized forms for community-specific metadata collection.
Known community filtered annotations
Phase ???: Not sure where these ideas best live...
----------
em: SCOAP library processes in machine-readable manner...
viewing external collections through library lenses... (e.g. citeseer?)
merging metadata from multiple sources (libraries, citeseer, etc.)
karger: Faculty/Departmental Profile...(list of publications)
em: RSS feed - notification services based on community of interest...
_________________________________________________________________
NEW Actions (A) / Issues (I)
ref owner summary
----|--------|---------
A22 bass initiate remaining contract with OSP.
A23 bass update methodology to more fully reflect PI discussion
of 2002-08-14. Specifically capture detail on required
tasks, and resulting platform capabilities.
_________________________________________________________________
OPEN Actions (A) / Issues (I) [in rough priority order]
ref owner summary
--------
progress / status
----|--------|---------
A22 bass initiate remaining contract with OSP.
A23 bass update methodology to more fully reflect PI discussion
of 2002-08-14. Specifically capture detail on required
tasks, and resulting platform capabilities.
I11 all, Strengthen, capture, review: driving/focusing scenarios for
bass use, and relationship to methodology phases.
A8 bass PI forum for Kenzie to present and all to discuss unmet
metadata and information interoperability needs from
MIT Libraries perspective
A13 all follow through with requested paperwork to OSP ASAP after
receiving it. We need contracts to close by end of July
if at all possible.
--------
July close appears unlikely, yet need to keep process
moving remains.
A21 em + Forward essential terms of SWAD Europe IPR framework,
for consideration / applicability in SIMILE environment
+ Forward helpful one-liners wrt IPR positioning
in support of / leading to a royalty-free IPR position.
A16 bass confirm August 14 Cambridge with all PIs
--------
works for em and karger. Mick to confirm with kenzie.
A20 bass, clarify DSpace / SIMILE relationship.
kenzie Need a statement / context diagram that we can post
publicly.
--------
Reviewed diagram with PIs.
<http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Simile-Ecosystem.
pdf>
Need 1 page explanatory text to accompany.
A12 all, Brainstorm, scope, (within available resources), and plan to
deliver a simple yet compelling demonstrator by end of
startup phase.
A9 miller, gather & distribute to PIs RDF corpuses for
kenzie, (1) DSpace History and (2) Barton Library Catalog
bass
--------
tar of RDF history bitstreams in hand.
need bass/kenzie policy conversation.
2002-08-14 update: have MIT Libs policy, ready for handoff
I7 kenzie, relationship METS :: RDF
miller
--------
em willing to spend energy here.
mets -> closed world?
motivation: functionality/capabilities of METS coupled with
openness, flexibility of RDF. RDFS for METS?
A1 karger summarize test case / requirements on Jena for use in
haystack
... re-use cholesterol?
but not hardened
memory resident only - won't scale
... hope jena to get karger team out of database work
... root cause between sleepycat/Jena and cholesterol
... simple mods to existing Jena DB backends
action Karger & Bass (serialize) -> profiling
I2 karger should cholesterol become an alternative RDB implementation
for Jena?
I3 karger could/should Adenine be bundled with Jena?
I6 erickson what is CNRI position regarding registration of handle
system
as a URI scheme with IANA?
_________________________________________________________________
CLOSED Actions (A) / Issues (I)
ref owner summary
--------
resolution
----|--------|---------
A14 bass followup with Lissa Natkin, LCS
--------
met 2002-07-17, communication established and remains open.
A15 bass send www-rdf-dspace subscribe instructions to all PIs
--------
sent.
A19 bass consolidate and make available meeting presentation
materials
and other useful resources
--------
here they are! also set up SIMILE project web page.
A17 all provide non-available times to bass for weekly PI phone call
A18 bass identify time for weekly PI phone call
--------
Weekly call identified: Fridays at 2pm Eastern.
May need to revisit after classes resume.
A4 bass, followup Quan's "Jena in Haystack" note with Dennis
karger and Jena team
--------
sent to Jena team
I5 kenzie, relationship DSpace Format Registry :: Mime Type open issues
miller in W3C Technical Architecture Group
- http://www.w3.org/2001/tag/ilist#w3cMediaType-1
- http://www.w3.org/2001/tag/ilist#customMediaType-2
- http://www.w3.org/2001/tag/ilist#nsMediaType-3
- http://www.w3.org/2001/tag/ilist#uriMediaType-9
--------
close per em, having shared info
A10 karger, haystack team pairwise w/ kenzie & library staff to scope
kenzie UI reqs for library catalogers / "semantic annotators"
--------
Karger update: watched simple "cataloging a book" steps.
lots of flipping back and forth...
idea: haystack UI to present in one place all info needed
to catalog a book, and allow input of reqd data.
OCLC data comes back in terminal interface
No RDF version of MARC
xml-schema for MARC - but not flattened
action -> meet reference librarians? (much harder...)
_________________________________________________________________
Resources
SIMILE Public Page
http://web.mit.edu/dspace-dev/www/simile/
SIMILE Research Proposal (public, version 2002-04)
http://web.mit.edu/dspace-dev/www/simile/resources/proposal-2002-04/index.ht
m
SIMILE Startup Phase Goals and Deliverables (July - October 2002)
http://web.mit.edu/dspace-dev/www/simile/resources/goals-summer-2002.txt
W3C Semantic Web Activity
http://www.w3.org/2001/sw/
W3C Semantic Web Advanced Development Presentation Materials
http://www.w3.org/2002/Talks/07-simile-swad/
Haystack Public Page
http://haystack.lcs.mit.edu/
Haystack Presentation Materials
http://web.mit.edu/dspace-dev/www/simile/resources/Haystack-Overview.pdf
DSpace Architecture Presentation Materials
http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Arch-Overview-2002
-07-11.pdf
DSpace White Papers (Functionality, Architecture)
http://www.dspace.org/live/implementation/design_documents/functionality.pdf
http://www.dspace.org/live/implementation/design_documents/architecture.pdf
Reference Model for an Open Archival Information Systems (OAIS)
http://ssdoo.gsfc.nasa.gov/nost/isoas/
http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf
HP Labs Semantic Web Activity
http://hpl.hp.com/semweb/
Jena Semantic Web Toolkit
http://www.hpl.hp.com/semweb/jena-top.html
Jena Presentation Materials
URL not available
Jena Development Public List
http://groups.yahoo.com/group/jena-dev/
www-rdf-dspace
http://lists.w3.org/Archives/Public/www-rdf-dspace/
mailto://www-rdf-dspace@w3.org
SCORM - Sharable Content Object Reference Model
http://www.adlnet.org
Open Archives Initiative - protocol for metadata harvesting and sharing
http://www.openarchives.org/
METS - Metadata Encoding and Transmission Standard
http://www.loc.gov/standards/mets/
RUDOLF
http://www.ilrt.bris.ac.uk/discovery/rdf-dev/rudolf/
_________________________________________________________________
=============================================
Mick Bass
External Engagement Manager
HP Labs / MIT DSpace Program
Hewlett-Packard Company
Building 10-500 MIT, 77 Massachusetts Avenue
Cambridge, MA 02139-4307
617.253.6617 office 617.452.3000 fax
617.899.3938 mobile 617.627.9694 residence
bass@alum.mit.edu mick_bass@hp.com
=============================================
Received on Thursday, 22 August 2002 15:35:32 UTC