2002-08-14 SIMILE PI Meeting Minutes

Attached are minutes from the SIMILE PI Meeting held on 2002-08-14.  These
are also available at http://web.mit.edu/dspace-dev/www/simile/.

Please let me know if I've misrepresented anything, and review open
actions!!

Enjoy,

- Mick

      _________________________________________________________________

                        SIMILE PI Face to Face Meeting

                        2002-08-14

                        Summary, Actions, and Issues

                        Editor: Mick Bass, HP
                                mailto:bass@alum.mit.edu

         This document:
 
http://web.mit.edu/dspace-dev/www/simile/minutes/minutes-2002-08-14.txt

         Series:
         http://web.mit.edu/dspace-dev/www/simile/minutes/index.html

      _________________________________________________________________


Table of Contents

    Agenda

    Attendees

    New Resources

    Summary

    New Actions / Issues

    Open Actions / Issues

    Closed Actions / Issues

    Resource List

      _________________________________________________________________


Agenda

0. Review Agenda

1. brief updates: funding / IPR frameworks

2. use case / demonstrator brainstorm & hone

3. capture tech platform ramifications

4. tune methodology

      _________________________________________________________________


Attendees (all in person)

    David Karger (PI)        MIT, LCS / AI Lab
    Eric Miller (Co-I)       W3C
    Mick Bass (HP PI)        HP Labs
    MacKenzie Smith (Co-I)   MIT, Libraries

      _________________________________________________________________


New Resources

    The following resources were brought to the attention of the group
    during the course of the meeting.  The complete resource list
    (including those listed here) can be found at the end of this
    document.

    SCORM - Sharable Content Object Reference Model
http://www.adlnet.org
    
    Open Archives Initiative - protocol for metadata harvesting and sharing
http://www.openarchives.org/

    METS - Metadata Encoding and Transmission Standard
http://www.loc.gov/standards/mets/

    RUDOLF - ideas and demos for wrapping various services
    (e.g. mailing lists, search engine APIs) to expose results or
    contents as RDF.  n.b. the handle system wrapper.  em also
    mentioned (plans for?) a google wrapper.
http://www.ilrt.bris.ac.uk/discovery/rdf-dev/rudolf/

      _________________________________________________________________


Summary

--

The presentation materials used during the meeting are at:
 
<http://web.mit.edu/dspace-dev/www/simile/resources/PI-meeting-2002-08-14.pd
f>

--

Brief updates: funding and IPR framework

  - HP has commited to funding SIMILE (both contracts, startup phase
    and remainder).  bass to proceed with initiation of remaining
    SIMILE contract with OSP, karger.
      action -> A22

  - MIT meetings still underway to understand existing encumbrances
    on haystack IP.  Conclusions expected within days.

--

We reviewed and discussed the relationship between Simile and DSpace
(See A20).  We distinguished between Simile (this research project)
and DSpace (historic development, ongoing incremental improvement and
support, and deployment at academic research institutions).  The
diagram at:

 
<http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Simile-Ecosystem.
pdf>

represents these distinct, yet related activities.  It is the explicit
and shared intent of both Simile and DSpace to (1) leverage the
current DSpace technology platform as a starting point for the Simile
research methodology, and to (2) share appropriate and useful Simile
research results through the technology deployment channel offered by
the DSpace federation.  This can happen by ongoing participation of
Simile PIs on the governing coalition of the DSpace federation, and an
ongoing technical dialogue between Simile researchers and DSpace
developers who are prioritizing and implementing incremental
enhancements to the production DSpace system.  MIT Libraries will take
a lead role in verifying with selected collaborators from among the
federating DSpace institutions that potential services culled from
Simile are indeed useful, widely deployable, and production-ready.

We noted the required action to bundle this diagram with a single page of
explanatory text (developed from above statements) that can be
distributed as a standalone document explaining the benefits of the
DSpace/Simile relationship to potential DSpace adopters.
    update action -> A20

--

We reviewed the proposed research methodology with respect to
technology platform objectives.  PIs voiced consensus on
the following points:

   - The desired methodology has the following attributes:

         1. it clearly distinguishes between requirements on the
            technology platform architecture and requirements
            driven by application-specific or institution-specific
            policy decisions.

            That is, we wish to develop a substrate for flexible
            deployment of services on heterogeneous information
            objects.  We wish the substrate to be supportive of
            multiple potential policy decisions about what types
            of information objects and/or services will be
            deployed in any particular environment.  And we wish
            the substrate to be a layer on top of the internet
            architecture, and demonstrative of semantic web
            techniques.

            For example, MIT may decide that (for the time being) they
            will disallow submissions of items that contain only
            metadata.  But this does not mean that the technology
            platform need not support items containing only metadata,
            because an alternative policy decision (by either MIT or
            another institution) that would require this support from
            the technology platform is emminently foreseeable.

         2. It defines with the end in mind, and implements
            incrementally.

            That is, the methodology should takes a long-term
            view on platform architecture requirements and
            consequent capabilities, yet choose use cases that
            allow intermediate results and demonstrations to
            be acheived with bounded engineering effort.

            Put another way, we shouldn't have to build everything in
            order to be able to do anything.  But what we do needs to
            be consistent with where we're heading.

            This is the motivation of the phases in the research
            methodology.  Each phase defines key incremental
            platform capabilities that are consistent with -
            and steps along the way towards - implementation of
            a desired technology platform architecture.

   - an imperative early research objective is to define and publish
     an RDF-based data model and schema for DSpace instances (to parallel
     the existing RDBMS table-based data model), and to make the
     resultant schema and instances available for research use.  This
     store of RDF should also include the existing RDF data currently
     produced by the DSpace history subsystem.

     It is also possible that some appropriate (as defined by MIT
     Libraries) subset of this data could be made publicly available.

     Such availability would enable additional services to be created
     independently from the DSpace application.

   - establishing an "RDF server" would be a good way of making such a
     corpus of RDF available.  Andy Seaborne's Joseki server is a good
     candidate - it makes RDF available via http (or https) gets.
     DSpace could be augmented to optionally deposit RDF History data
     to a bundled Joseki server in addition to simply writing it to
     the file system.

   - These two key platform capabilities would pave the way for:
          - introduction of community-specific schemas as DSpace items
            (a "schemas" collection)
          - association of DSpace collections with schemas, thus
            defining instance metadata to be gathered at item
            ingestion
          - addition of community-specific metadata subsequent to item
            ingestion
          - ability to disseminate associated instance metadata along
            with discovered DSpace items
          - indexing services based upon available community-specific
            instance metadata

Dave Karger discussed techniques currently used in haystack which
define ontologies for describing how a particular set of content
should be presented/viewed.  In particular, community-specific schemas
might be annotated using a UI-hint-providing schema.  Then a UI which
could parse the schema and understandt the hints could guide data
ingestion and/or display appropriately.

A slightly updated methodology overview based upon this discussion is
available at;
 
<http://web.mit.edu/dspace-dev/www/simile/resources/methodology-overview-200
2-08-14.txt>

We noted the need to expand this methodology to incorporate more
detail from our phase-by-phase discussion of required tasks
and resulting capabilities.
    action -> A22

--

Externally Visible Use Cases / Services Brainstorm:

We brainstormed externally visible services and use cases that could
be constructed from the capabilities introduced during each phase of
the methodology.

from more simple to more complex, by phase.  We mostly focused on the
earlier phases...

Phase 1:
--------

- published schema for DSpace data model
- RDF store exposing schema and instances
- add DSpace history info to RDF store

Service: Browse temporal and event history of a DSpace item

Service: Retrieve a machine-interpretable graph including reference to
    all past serializations of a DSpace item, and their temporal and
    event relationships

Service: Retrieve a DSpace item as it existed on specified date

Service: query history information.  For example "Retrieve DSpace
    items whose keywords have been edited during the past year", or
    "Retrieve DSpace items to which PDF files were added after initial
    submission"

Service: notification service via persistent periodic RDF query, email push

Service: DSpace system deposit of useful tracking info to RDF store, e.g.
         query streams, items accessed, etc.

- submit simple (flat name-value pairs, no hierarchy)
  community-specific schemas to DSpace
- implement simple schema-driven submission UI
- implement simple schema-driven dissemination UI

Service: "Schema Registration" for community-specific schemas, so they can
be discovered by other communities.

Service: "Schema Browsing" within the DSpace "Community Schemas" collection

Service: community-specific metadata submission/retrieval
         For example:
             OCW, needs SCORM, alternative contexts
             Art & Architecture Images, needs VRA-core
             Sorger / Biomedical Images, needs externally hosted content
                 and community-specific metadata
                      
- implement schema-driven query UI, bind to RDF Query Capablity

Service: community-specific query, for example "Find all DSpace Items that
have VRA-Core instance metadata and where Media is Pastel"

Service: extract and add citation metadata to DSpace items (ala citeseer)

Service: "What's related?"  Library administrators add multiple
    "What's related" URIs and caption text to DSpace items, using an
    annotea-like schema.  This could be used to contextualize results
    in terms of related materials, disciplines, collections,
    interests, etc.
    "See also" annotation service...

Service: "Submit MIT Faculty Comments on this item",
         "Read MIT Faculty Comments on this item"

- implement cross-corpus OLAP-like extraction (e.g. histogram generation)

Service: monitor high-usage metadata instances for cleanup prioritization.

Service: "hot items" reported via prioritization heuristics:
              - multiply cataloged (multiple schemas)
              - frequency of use
              - co-citation

Service: characterization of saved search trails


Phase 2:
--------

Authority control / type validation on ingest.

OCLC metadata augmentation (authority control, classification)... at
submission or at a later time

Schemas for describing how items should be viewed, annotations of items in
said schema, and a UI that is respectful of such annotations...

Copy Cataloging - workflow and data integration from several sources?
     Tools to support lower-cost cataloging and metadata creation...

Customized forms for community-specific metadata collection.

Known community filtered annotations


Phase ???: Not sure where these ideas best live...
----------

em: SCOAP library processes in machine-readable manner...
viewing external collections through library lenses... (e.g. citeseer?)
merging metadata from multiple sources (libraries, citeseer, etc.)

karger: Faculty/Departmental Profile...(list of publications)

em: RSS feed - notification services based on community of interest...

      _________________________________________________________________


NEW Actions (A) / Issues (I)

  ref  owner    summary
  ----|--------|---------
  A22  bass     initiate remaining contract with OSP.

  A23  bass     update methodology to more fully reflect PI discussion
                of 2002-08-14.  Specifically capture detail on required
                tasks, and resulting platform capabilities.
      _________________________________________________________________


OPEN Actions (A) / Issues (I)   [in rough priority order]

  ref  owner    summary
                --------
                progress / status
  ----|--------|---------

  A22  bass     initiate remaining contract with OSP.

  A23  bass     update methodology to more fully reflect PI discussion
                of 2002-08-14.  Specifically capture detail on required
                tasks, and resulting platform capabilities.

  I11  all,     Strengthen, capture, review: driving/focusing scenarios for
       bass     use, and relationship to methodology phases.

  A8   bass     PI forum for Kenzie to present and all to discuss unmet
                metadata and information interoperability needs from
                MIT Libraries perspective

  A13  all      follow through with requested paperwork to OSP ASAP after
                receiving it.  We need contracts to close by end of July
                if at all possible.
                --------
                July close appears unlikely, yet need to keep process
                moving remains.

  A21   em      + Forward essential terms of SWAD Europe IPR framework,
                  for consideration / applicability in SIMILE environment
                + Forward helpful one-liners wrt IPR positioning
                  in support of / leading to a royalty-free IPR position.

  A16  bass     confirm August 14 Cambridge with all PIs
                --------
                works for em and karger.  Mick to confirm with kenzie.

  A20   bass,   clarify DSpace / SIMILE relationship.
        kenzie  Need a statement / context diagram that we can post
publicly.
                --------
                Reviewed diagram with PIs.
 
<http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Simile-Ecosystem.
pdf>
                Need 1 page explanatory text to accompany.

  A12  all,     Brainstorm, scope, (within available resources), and plan to
                deliver a simple yet compelling demonstrator by end of
                startup phase.

  A9   miller,  gather & distribute to PIs RDF corpuses for
       kenzie, (1) DSpace History and (2) Barton Library Catalog
       bass   
                --------
                tar of RDF history bitstreams in hand.
                need bass/kenzie policy conversation.
                2002-08-14 update: have MIT Libs policy, ready for handoff

  I7   kenzie,  relationship METS :: RDF
       miller
                --------
                em willing to spend energy here.
                mets -> closed world?
                motivation: functionality/capabilities of METS coupled with
                openness, flexibility of RDF.  RDFS for METS?

  A1   karger   summarize test case / requirements on Jena for use in
haystack

                ...  re-use cholesterol?
                     but not hardened
                     memory resident only - won't scale

                ...  hope jena to get karger team out of database work

                ...  root cause between sleepycat/Jena and cholesterol

                ...  simple mods to existing Jena DB backends

                action Karger & Bass (serialize) -> profiling

  I2   karger   should cholesterol become an alternative RDB implementation
                for Jena?

  I3   karger   could/should Adenine be bundled with Jena?

  I6   erickson what is CNRI position regarding registration of handle
system
                as a URI scheme with IANA?

      _________________________________________________________________


CLOSED Actions (A) / Issues (I)

  ref  owner    summary
                --------
                resolution
  ----|--------|---------

  A14  bass     followup with Lissa Natkin, LCS
                --------
                met 2002-07-17, communication established and remains open.

  A15  bass     send www-rdf-dspace subscribe instructions to all PIs
                --------
                sent.

  A19  bass     consolidate and make available meeting presentation
materials
                and other useful resources
                --------
                here they are!  also set up SIMILE project web page.

  A17  all      provide non-available times to bass for weekly PI phone call
  A18  bass     identify time for weekly PI phone call
                --------
                Weekly call identified: Fridays at 2pm Eastern.
                May need to revisit after classes resume.

  A4   bass,    followup Quan's "Jena in Haystack" note with Dennis
       karger   and Jena team
                --------
                sent to Jena team

  I5   kenzie,  relationship DSpace Format Registry :: Mime Type open issues
       miller   in W3C Technical Architecture Group
                  - http://www.w3.org/2001/tag/ilist#w3cMediaType-1
                  - http://www.w3.org/2001/tag/ilist#customMediaType-2
                  - http://www.w3.org/2001/tag/ilist#nsMediaType-3
                  - http://www.w3.org/2001/tag/ilist#uriMediaType-9
                --------
                close per em, having shared info

  A10  karger,  haystack team pairwise w/ kenzie & library staff to scope
       kenzie   UI reqs for library catalogers / "semantic annotators"
                --------
                Karger update: watched simple "cataloging a book" steps.
                lots of flipping back and forth...
              
                idea: haystack UI to present in one place all info needed
                to catalog a book, and allow input of reqd data.

                OCLC data comes back in terminal interface

                No RDF version of MARC
                xml-schema for MARC - but not flattened

                action -> meet reference librarians?  (much harder...)

      _________________________________________________________________


Resources

    SIMILE Public Page
http://web.mit.edu/dspace-dev/www/simile/

    SIMILE Research Proposal (public, version 2002-04)
http://web.mit.edu/dspace-dev/www/simile/resources/proposal-2002-04/index.ht
m

    SIMILE Startup Phase Goals and Deliverables (July - October 2002)
http://web.mit.edu/dspace-dev/www/simile/resources/goals-summer-2002.txt

    W3C Semantic Web Activity
http://www.w3.org/2001/sw/

    W3C Semantic Web Advanced Development Presentation Materials
http://www.w3.org/2002/Talks/07-simile-swad/

    Haystack Public Page
http://haystack.lcs.mit.edu/

    Haystack Presentation Materials
http://web.mit.edu/dspace-dev/www/simile/resources/Haystack-Overview.pdf

    DSpace Architecture Presentation Materials
http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Arch-Overview-2002
-07-11.pdf

    DSpace White Papers (Functionality, Architecture)
http://www.dspace.org/live/implementation/design_documents/functionality.pdf
http://www.dspace.org/live/implementation/design_documents/architecture.pdf

    Reference Model for an Open Archival Information Systems (OAIS)
http://ssdoo.gsfc.nasa.gov/nost/isoas/
http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf

    HP Labs Semantic Web Activity
http://hpl.hp.com/semweb/

    Jena Semantic Web Toolkit
http://www.hpl.hp.com/semweb/jena-top.html

    Jena Presentation Materials
URL not available

    Jena Development Public List
http://groups.yahoo.com/group/jena-dev/

    www-rdf-dspace
http://lists.w3.org/Archives/Public/www-rdf-dspace/
mailto://www-rdf-dspace@w3.org

    SCORM - Sharable Content Object Reference Model
http://www.adlnet.org
    
    Open Archives Initiative - protocol for metadata harvesting and sharing
http://www.openarchives.org/

    METS - Metadata Encoding and Transmission Standard
http://www.loc.gov/standards/mets/

    RUDOLF
http://www.ilrt.bris.ac.uk/discovery/rdf-dev/rudolf/

      _________________________________________________________________

=============================================
Mick Bass


External Engagement Manager
HP Labs / MIT DSpace Program
Hewlett-Packard Company
Building 10-500 MIT, 77 Massachusetts Avenue
Cambridge, MA 02139-4307


617.253.6617 office    617.452.3000 fax
617.899.3938 mobile    617.627.9694 residence
bass@alum.mit.edu      mick_bass@hp.com
=============================================
 

Received on Thursday, 22 August 2002 15:35:32 UTC