meeting notes and outcomes, 11 July 2002

Attached are meeting notes from the SIMILE PI Meeting and Technical Exchange
of Thursday 11 July 2002.

I have also set up a SIMILE Project web page at
http://www.mit.edu/dspace-dev/simile/
and consolidated project resources to date (including these meeting notes).

Please let me know if I've misrepresented anything, and be sure to check out
the actions and issues with your name next to them.

Best Regards,

- Mick

         _________________________________________________________________

                        SIMILE PI Meeting and Technical Exchange

                        2002-07-11

                        Summary, Actions, and Issues

                        Editor: Mick Bass, HP
                                mailto:bass@alum.mit.edu

         This document:
         http://www.mit.edu/dspace-dev/simile/minutes/minutes-2002-07-11.txt

         Series:
         http://www.mit.edu/dspace-dev/simile/minutes/index.html

         _________________________________________________________________


Table of Contents

    Agenda

    Attendees

    Resources

    Summary

    New Actions / Issues

    Open Actions / Issues

    Closed Actions / Issues

      _________________________________________________________________


Agenda (as revised)

    Welcome, Goals for the Day and for the Summer

    W3C Advanced Development
        Q&A, Discussion

    Jena: brief overview, current & future directions
    Haystack: brief overview
        Q&A, discussion, brainstorm wrt jena, haystack
        (videoconf & netmeeting from Bristol, UK)

    Haystack Demo

    Lunch

    DSpace: architecture, target research platform

    Discussion
        review proposed methodology
        review deliverables in startup phase
        working process for remainder of startup phase
        use case / demonstrator hone & brainstorm

      _________________________________________________________________


Attendees

    David Karger (PI)        MIT, LCS / AI Lab
    David Huynh              MIT, LCS / AI Lab
    Dennis Quan              MIT, LCS / AI Lab
    Vineet Sinha             MIT, LCS / AI Lab

    MacKenzie Smith (Co-I)   MIT, Libraries

    Eric Miller (Co-I)       W3C
    Marja-Riitta Koivunen    W3C

    Mick Bass                HP Labs
    Robert Tansley           HP Labs
    John Erickson            HP Labs (via video)
    Nick Wainwright          HP Labs
    Rivka Ladin              HP Labs

Partial-day attendees

    Eytan Adar               HP Labs
    David Stuve              HP Labs
    Brian McBride            HP Labs (via video)
    Dave Reynolds            HP Labs (via video)
    Andy Seaborne            HP Labs (via video)

      _________________________________________________________________


Resources

    SIMILE Public Page
http://www.mit.edu/dspace-dev/www/simile/

    SIMILE Research Proposal (public, version 2002-04)
http://www.mit.edu/dspace-dev/www/simile/resources/proposal-2002-04/index.ht
m

    SIMILE Startup Phase Goals and Deliverables (July - October 2002)
http://www.mit.edu/dspace-dev/www/simile/resources/goals-summer-2002.txt

    W3C Semantic Web Activity
http://www.w3.org/2001/sw/

    W3C Semantic Web Advanced Development Presentation Materials
http://www.w3.org/2002/Talks/07-simile-swad/

    Haystack Public Page
http://haystack.lcs.mit.edu/

    Haystack Presentation Materials
http://www.mit.edu/dspace-dev/www/simile/resources/Haystack-Overview.pdf

    DSpace Architecture Presentation Materials
http://www.mit.edu/dspace-dev/www/simile/resources/DSpace-Arch-Overview-2002
-07-11.pdf

    DSpace White Papers (Functionality, Architecture)
http://www.dspace.org/live/implementation/design_documents/functionality.pdf
http://www.dspace.org/live/implementation/design_documents/architecture.pdf

    Reference Model for an Open Archival Information Systems (OAIS)
http://ssdoo.gsfc.nasa.gov/nost/isoas/
http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf

    HP Labs Semantic Web Activity
http://hpl.hp.com/semweb/

    Jena Semantic Web Toolkit
http://www.hpl.hp.com/semweb/jena-top.html

    Jena Presentation Materials
URL not available

    Jena Development Public List
http://groups.yahoo.com/group/jena-dev/

    www-rdf-dspace
http://lists.w3.org/Archives/Public/www-rdf-dspace/
mailto://www-rdf-dspace@w3.org

      _________________________________________________________________


Summary

(summarizes both the meeting and some subsequent pairwise interactions)

W3C Advanced Development

    Eric Miller presented an overview of W3C Semantic Web advanced
    development activities.  Marja-Riitta Koivunen described current
    work on Annotea.

Jena, Haystack, Persistent RDF Stores

    Brian McBride, Dave Reynolds, and Andy Seaborne joined as Brian
    McBride presented a brief overview of the Jena semantic web
    toolkit.  David Karger followed with a brief overview of Haystack.
    
    Open discussion regarding potential haystack/jena interactions,
    and especially focused on database back-ends.(->A1)
    The discussion included:
    
         - a discussion of Cholesterol (current RDF store used in
           haystack) and adenine (scripting language that produces
           RDF).  (->I2, I3)
    
         - Haystack's need to issue many (hundreds per second) small,
           simple queries to be able to render its UI
    
         - reification, the haystack "belief server", and overheads
           associated with storing reified rdf statements
    
         - event triggers
    
    Dave Reynolds subsequently constructed a simple tree-walking
    testcase that demonstrates query throughput in the "hundreds per
    second" range on sub-$1K PCs.  See Dave's note on www-rdf-dspace.
    http://lists.w3.org/Archives/Public/www-rdf-dspace/2002Jul/0002.html
    
    Dennis Quan has also since undertaken the effort to create a
    version of Haystack that uses Jena instead of Cholesteral, to shed
    light on any hidden requirements.  This has uncovered some
    additional issues, which Dennis describes in a note on
    www-rdf-dspace.
    http://lists.w3.org/Archives/Public/www-rdf-dspace/2002Jul/0004.html    
    (->A4)
    
DSpace Architecture

    Rob Tansley presented an overview of the DSpace Architecture.

    Eric Miller pointed out overlap between the DSpace bitstream
    format registry and open issues regarding mime-types in the W3C
    Technical Architecture Group. (->I5)

    Will Handles ever be URIs?  They are URI-compliant, but what are
    CNRI's plans with respect to registration? (->I6)

    Can there be / should there be an RDF Schema for METS? (->I7)

    DSpace currently uses Jena to create RDF serializations of
    instances in the DSpace data model and descriptions of
    relationships among these serializations and events that change
    the state of the archive.  DSpace does not yet use Jena as a
    persistent RDF store.  Many opportunities exist to make the
    existing RDF information more readily query-able, as well as to
    store community- and format-specific metadata in a persistent RDF
    store in addition to storing it in serialized bitstreams, as is
    currently done.

    We revisited the desire and need for the output and progress from
    SIMILE to respond to strong demand from MIT Libraries, the library
    community, and the effort to federate DSpace at a number of
    leading universities.  Dave Karger requested a followup session
    where MacKenzie could share her perspective on unmet needs in in
    the library community (->A8)

    We clarified the difference between DSpace RDF History data
    (which describes instances of the DSpace data model, and the
    temporal relationships between them and key events that change
    the state of the archive), and RDF transformations of MARC from
    Barton, the libraries catalog.

    Dave Karger and his team expressed interest in immediately
    beginning to develop some simple Haystack UI components for
    exploring and navigating both of these corpuses of RDF metadata.
    Eric Miller agreed to assist with transformations
    (if any required) on the DSpace RDF instance serializations
    and history metadata. (->A9, A10)

    Dave Karger envisions 3 possible variants of haystack clients:

        - generic. This is haystack as it exists today library client.

        - library client.  This is a personal haystack extended to
          provide richer functionality for interoperating with library
          resources.  Or a library-user haystack able to interoperate
          with personal resources, depending on your perspective :)

        - library administrator.  This is a haystack optimized for
          those that maintain information corpus, and take decisions
          about how it can and should be organized and annotated to
          provide optimal value to the designated community. (->A10)

Methodology Walkthrough

    We reviewed the three phases that we've initially scoped for the
project:

    1 + Support submit/query/discover/retrieve of instances of
        simple flat (e.g. name-value pairs) community-specific schemas.
      + Support simple (free ascii text, human-readable) annotations
        of DSpace items.
      + Integrate Jena, persistent RDF store with DSpace.
      + Add consequent UI support.

    2 + Add support for machine-readable annotations to any instance
        in the DSpace data model (including annotations to individual
        metadata elements).  This will enable authority-control
        annotation-services layered atop the originally-submitted
        metadata.
      + Add support for discovery and incorporation of distributed
        metadata stores (some metadata hosted by libary, some
        metadata hosted by community or department, some personal
        metadata).
      + Add required consequent dissemination/viewing architecture.

    3 + Add support for rich schemas (hierarchy, whole/part, etc.)
      + schema mapping & intelligent query response through
        aggregation of distributed schema annotations.

    A key issue is that the relationship between these phases, and the
    proposed additional platform capabilities to be demonstrated in
    each, and unmet scenarios for use in the proposed pilot domain
    (libraries) is not yet sufficiently well articulated and
    vetted. (->I11)

    These phases are not yet frozen.  The PIs retain the flexibility
    to change their definitions, and the relationship of each to
    articulated demonstrators and scenarios for use.

    We agreed to be explicit about the difference between:
        - service-deployment policy decisions taken in a particular
          environment, and
        - required platform capabilities recognizing that platform
    capabilities may need to be be more rich than the specific
    (policy-driven) requirements in any one environment, in order to
    support the different and various service-deployment decisions
    that will be taken elsewhere.
     
    Example: Would libraries host metadata for resources that they do
    not host? (In the current DSpace service offered by MIT Libraries,
    the answer is "no").  Should the SIMILE platform be capable of
    distributed metadata such that an institution can decide to host
    metadata for resources that they do not host? (The answer is very
    arguably "yes".  For instance, much annotation metadata will
    relate multiple resources, rather than describe one or the other).

    We discovered that the term "annotation" is a victim of semantic
    overload.  W3C SWAD, and specifically the Annotea team, uses it in
    its most general form (a machine-readable statement that describes
    a relationship between two resources).  In phase two of the
    methodology we use it to mean additional, or perhaps corrected,
    machine-readable metadata for DSpace items (examples: suggested
    subject keywords for a previously-submitted item; a corrected or
    otherwise improved author (authority-controlled, perhaps)
    annotating the author as originally submitted.  In phase one we
    mean it in an even simpler and less granular sense: human-readable
    ascii prose annotating items submitted in DSpace.  This is another
    example platform capabilies vs. service deployment choices.  We'll
    need to architect from the beginning with the most general case in
    mind.  But we'll want to turn on with progressively more complex
    deployed services, starting with the simplest ones.

Deliverables in the Startup Phase

    We reviewed the goals and deliverables for the startup phase of the
project.

    Nick Wainwright suggested that we search for a way to exit the
    startup phase with a simple but solidly grounded demo/example in
    addition to the deliverables that we reviewed (project plan, etc.)
    (->A12)

    We reviewed the current status of contracts with Office of
    Sponsored Programs, finance contacts in LCS & Libraries, and the
    need to turn around contracts as quickly as possible. (->A13, A14)

Preferred working processes

    We discussed our preferences regarding how to work together to
    efficiently get the work done.  Outcomes included:

    - Need an email list for group communication.  We decided to use
      www-rdf-dspace, an existing list at W3C that we set up during
      the DSpace project for discussion about application of RDF
      within DSpace. (->A15)

    - We agreed that we will need 3-4 face to face working meetings
      including all PIs, somewhere between 1/2 day to a full day each.
      That's about once every three to four weeks.  Eric Miller stated
      a preference to do at least one of these in Ohio.  David Karger
      voiced constraints on ability to travel, and prefers Ohio later
      rather than earlier during the startup phase.  We tentatively
      agreed to August 14th for the next PI face to face, in
      Cambridge.  Need to confirm date & location. (->A16)

    - In addition to the PI f2f meetings, we acknowledged that we'd
      likely need ongoing offline pairwise interactions.  Results from
      these will ideally be written up and submitted to the group via
      www-rdf-dspace.

    - We agreed to steer on an ongoing basis through a weekly PI phone
      call.  The purpose of the weekly call is to tee up required
      offline interactions, take group decisions, and report on open
      issues/actions.  We tentatively agreed to Thursdays, 3pm for
      this call.  That time has since broken, we will need to find
      a different time. (->A17, A18)

    Several participants requested that we consolidate materials and
    resources from the meeting and make them available to the
    group. (->A19)

Hone & Brainstorm Demonstrators and Scenarios for Use

    We didn't make significant progress here, its top-of-the list
    for future work.

      _________________________________________________________________


NEW Actions (A) / Issues (I)

  ref  owner    summary
  ----|--------|---------
  A1   karger   summarize test case / requirements on Jena for use in
haystack

  I2   karger   should cholesterol become an alternative RDB implementation
                for Jena?

  I3   karger   could/should Adenine be bundled with Jena?

  A4   bass,    followup Quan's "Jena in Haystack" note with Dennis
       karger   and Jena team.

  I5   kenzie,  relationship DSpace Format Registry :: Mime Type open issues
       miller   in W3C Technical Architecture Group
                  - http://www.w3.org/2001/tag/ilist#w3cMediaType-1
                  - http://www.w3.org/2001/tag/ilist#customMediaType-2
                  - http://www.w3.org/2001/tag/ilist#nsMediaType-3
                  - http://www.w3.org/2001/tag/ilist#uriMediaType-9

  I6   erickson what is CNRI position regarding registration of handle
system
                as a URI scheme with IANA?

  I7   kenzie,  relationship METS :: RDF
       miller

  A8   bass     PI forum for Kenzie to present and all to discuss unmet
metadata
                and information interoperability needs from MIT Libraries
                perspective

  A9   miller,  gather & distribute RDF corpuses for (1) DSpace History and
       kenzie,  (2) Barton Library Catalog
       bass   

  A10  karger,  haystack team pairwise w/ kenzie & library staff to scope UI
reqs for
       kenzie   library catalogers / "semantic annotators"

  I11  all,     Strengthen, capture, review: Driving/focusing scenarios for
       bass     use, and relationship to methodology phases.

  A12  all,     Brainstorm, scope, (within available resources), and plan to
                deliver a simple yet compelling demonstrator by end of
                startup phase.

  A13  all      follow through with requested paperwork to OSP ASAP after
                receiving it.  We need contracts to close by end of July
                if at all possible.

  A14  bass     followup with Lissa Natkin, LCS

  A15  bass     send www-rdf-dspace subscribe instructions to all PIs

  A16  bass     confirm August 14 Cambridge with all PIs

  A17  all      provide non-available times to bass for weekly PI phone call
  A18  bass     identify time for weekly PI phone call

  A19  bass     consolidate and make available meeting presentation
materials
                and other useful resources

      _________________________________________________________________


OPEN Actions (A) / Issues (I)

  ref  owner    summary
                --------
                progress / status
  ----|--------|---------

  A1   karger   summarize test case / requirements on Jena for use in
haystack

  I2   karger   should cholesterol become an alternative RDB implementation
                for Jena?

  I3   karger   could/should Adenine be bundled with Jena?

  A4   bass,    followup Quan's "Jena in Haystack" note with Dennis
       karger   and Jena team
                --------
                sent to Jena team

  I5   kenzie,  relationship DSpace Format Registry :: Mime Type open issues
       miller   in W3C Technical Architecture Group
                  - http://www.w3.org/2001/tag/ilist#w3cMediaType-1
                  - http://www.w3.org/2001/tag/ilist#customMediaType-2
                  - http://www.w3.org/2001/tag/ilist#nsMediaType-3
                  - http://www.w3.org/2001/tag/ilist#uriMediaType-9

  I6   erickson what is CNRI position regarding registration of handle
system
                as a URI scheme with IANA?

  I7   kenzie,  relationship METS :: RDF
       miller

  A8   bass     PI forum for Kenzie to present and all to discuss unmet
metadata
                and information interoperability needs from MIT Libraries
                perspective

  A9   miller,  gather & distribute to PIs RDF corpuses for
       kenzie, (1) DSpace History and (2) Barton Library Catalog
       bass   
                --------
                tar of RDF history bitstreams in hand.
                need bass/kenzie policy conversation.

  A10  karger,  haystack team pairwise w/ kenzie & library staff to scope UI
reqs for
       kenzie   library catalogers / "semantic annotators"

  I11  all,     Strengthen, capture, review: driving/focusing scenarios for
       bass     use, and relationship to methodology phases.

  A12  all,     Brainstorm, scope, (within available resources), and plan to
                deliver a simple yet compelling demonstrator by end of
                startup phase.

  A13  all      follow through with requested paperwork to OSP ASAP after
                receiving it.  We need contracts to close by end of July
                if at all possible.

  A16  bass     confirm August 14 Cambridge with all PIs

  A17  all      provide non-available times to bass for weekly PI phone call
  A18  bass     identify time for weekly PI phone call

      _________________________________________________________________


CLOSED Actions (A) / Issues (I)

  ref  owner    summary
                --------
                resolution
  ----|--------|---------

  A14  bass     followup with Lissa Natkin, LCS
                --------
                met 2002-07-17, communication established and remains open.

  A15  bass     send www-rdf-dspace subscribe instructions to all PIs
                --------
                sent.

  A19  bass     consolidate and make available meeting presentation
materials
                and other useful resources
		--------
		here they are!  also set up SIMILE project web page.

      _________________________________________________________________


=============================================
Mick Bass


External Engagement Manager
HP Labs / MIT DSpace Program
Hewlett-Packard Company
Building 10-500 MIT, 77 Massachusetts Avenue
Cambridge, MA 02139-4307


617.253.6617 office    617.452.3000 fax
617.899.3938 mobile    617.627.9694 residence
bass@alum.mit.edu      mick_bass@hp.com
=============================================
 

Received on Friday, 19 July 2002 13:11:09 UTC