- From: BASS,MICK (HP-USA,ex1) <mick_bass@hp.com>
- Date: Thu, 22 Aug 2002 15:35:24 -0400
- To: www-rdf-dspace@w3.org
Attached are minutes from the SIMILE PI Meeting held on 2002-08-14. These are also available at http://web.mit.edu/dspace-dev/www/simile/. Please let me know if I've misrepresented anything, and review open actions!! Enjoy, - Mick _________________________________________________________________ SIMILE PI Face to Face Meeting 2002-08-14 Summary, Actions, and Issues Editor: Mick Bass, HP mailto:bass@alum.mit.edu This document: http://web.mit.edu/dspace-dev/www/simile/minutes/minutes-2002-08-14.txt Series: http://web.mit.edu/dspace-dev/www/simile/minutes/index.html _________________________________________________________________ Table of Contents Agenda Attendees New Resources Summary New Actions / Issues Open Actions / Issues Closed Actions / Issues Resource List _________________________________________________________________ Agenda 0. Review Agenda 1. brief updates: funding / IPR frameworks 2. use case / demonstrator brainstorm & hone 3. capture tech platform ramifications 4. tune methodology _________________________________________________________________ Attendees (all in person) David Karger (PI) MIT, LCS / AI Lab Eric Miller (Co-I) W3C Mick Bass (HP PI) HP Labs MacKenzie Smith (Co-I) MIT, Libraries _________________________________________________________________ New Resources The following resources were brought to the attention of the group during the course of the meeting. The complete resource list (including those listed here) can be found at the end of this document. SCORM - Sharable Content Object Reference Model http://www.adlnet.org Open Archives Initiative - protocol for metadata harvesting and sharing http://www.openarchives.org/ METS - Metadata Encoding and Transmission Standard http://www.loc.gov/standards/mets/ RUDOLF - ideas and demos for wrapping various services (e.g. mailing lists, search engine APIs) to expose results or contents as RDF. n.b. the handle system wrapper. em also mentioned (plans for?) a google wrapper. http://www.ilrt.bris.ac.uk/discovery/rdf-dev/rudolf/ _________________________________________________________________ Summary -- The presentation materials used during the meeting are at: <http://web.mit.edu/dspace-dev/www/simile/resources/PI-meeting-2002-08-14.pd f> -- Brief updates: funding and IPR framework - HP has commited to funding SIMILE (both contracts, startup phase and remainder). bass to proceed with initiation of remaining SIMILE contract with OSP, karger. action -> A22 - MIT meetings still underway to understand existing encumbrances on haystack IP. Conclusions expected within days. -- We reviewed and discussed the relationship between Simile and DSpace (See A20). We distinguished between Simile (this research project) and DSpace (historic development, ongoing incremental improvement and support, and deployment at academic research institutions). The diagram at: <http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Simile-Ecosystem. pdf> represents these distinct, yet related activities. It is the explicit and shared intent of both Simile and DSpace to (1) leverage the current DSpace technology platform as a starting point for the Simile research methodology, and to (2) share appropriate and useful Simile research results through the technology deployment channel offered by the DSpace federation. This can happen by ongoing participation of Simile PIs on the governing coalition of the DSpace federation, and an ongoing technical dialogue between Simile researchers and DSpace developers who are prioritizing and implementing incremental enhancements to the production DSpace system. MIT Libraries will take a lead role in verifying with selected collaborators from among the federating DSpace institutions that potential services culled from Simile are indeed useful, widely deployable, and production-ready. We noted the required action to bundle this diagram with a single page of explanatory text (developed from above statements) that can be distributed as a standalone document explaining the benefits of the DSpace/Simile relationship to potential DSpace adopters. update action -> A20 -- We reviewed the proposed research methodology with respect to technology platform objectives. PIs voiced consensus on the following points: - The desired methodology has the following attributes: 1. it clearly distinguishes between requirements on the technology platform architecture and requirements driven by application-specific or institution-specific policy decisions. That is, we wish to develop a substrate for flexible deployment of services on heterogeneous information objects. We wish the substrate to be supportive of multiple potential policy decisions about what types of information objects and/or services will be deployed in any particular environment. And we wish the substrate to be a layer on top of the internet architecture, and demonstrative of semantic web techniques. For example, MIT may decide that (for the time being) they will disallow submissions of items that contain only metadata. But this does not mean that the technology platform need not support items containing only metadata, because an alternative policy decision (by either MIT or another institution) that would require this support from the technology platform is emminently foreseeable. 2. It defines with the end in mind, and implements incrementally. That is, the methodology should takes a long-term view on platform architecture requirements and consequent capabilities, yet choose use cases that allow intermediate results and demonstrations to be acheived with bounded engineering effort. Put another way, we shouldn't have to build everything in order to be able to do anything. But what we do needs to be consistent with where we're heading. This is the motivation of the phases in the research methodology. Each phase defines key incremental platform capabilities that are consistent with - and steps along the way towards - implementation of a desired technology platform architecture. - an imperative early research objective is to define and publish an RDF-based data model and schema for DSpace instances (to parallel the existing RDBMS table-based data model), and to make the resultant schema and instances available for research use. This store of RDF should also include the existing RDF data currently produced by the DSpace history subsystem. It is also possible that some appropriate (as defined by MIT Libraries) subset of this data could be made publicly available. Such availability would enable additional services to be created independently from the DSpace application. - establishing an "RDF server" would be a good way of making such a corpus of RDF available. Andy Seaborne's Joseki server is a good candidate - it makes RDF available via http (or https) gets. DSpace could be augmented to optionally deposit RDF History data to a bundled Joseki server in addition to simply writing it to the file system. - These two key platform capabilities would pave the way for: - introduction of community-specific schemas as DSpace items (a "schemas" collection) - association of DSpace collections with schemas, thus defining instance metadata to be gathered at item ingestion - addition of community-specific metadata subsequent to item ingestion - ability to disseminate associated instance metadata along with discovered DSpace items - indexing services based upon available community-specific instance metadata Dave Karger discussed techniques currently used in haystack which define ontologies for describing how a particular set of content should be presented/viewed. In particular, community-specific schemas might be annotated using a UI-hint-providing schema. Then a UI which could parse the schema and understandt the hints could guide data ingestion and/or display appropriately. A slightly updated methodology overview based upon this discussion is available at; <http://web.mit.edu/dspace-dev/www/simile/resources/methodology-overview-200 2-08-14.txt> We noted the need to expand this methodology to incorporate more detail from our phase-by-phase discussion of required tasks and resulting capabilities. action -> A22 -- Externally Visible Use Cases / Services Brainstorm: We brainstormed externally visible services and use cases that could be constructed from the capabilities introduced during each phase of the methodology. from more simple to more complex, by phase. We mostly focused on the earlier phases... Phase 1: -------- - published schema for DSpace data model - RDF store exposing schema and instances - add DSpace history info to RDF store Service: Browse temporal and event history of a DSpace item Service: Retrieve a machine-interpretable graph including reference to all past serializations of a DSpace item, and their temporal and event relationships Service: Retrieve a DSpace item as it existed on specified date Service: query history information. For example "Retrieve DSpace items whose keywords have been edited during the past year", or "Retrieve DSpace items to which PDF files were added after initial submission" Service: notification service via persistent periodic RDF query, email push Service: DSpace system deposit of useful tracking info to RDF store, e.g. query streams, items accessed, etc. - submit simple (flat name-value pairs, no hierarchy) community-specific schemas to DSpace - implement simple schema-driven submission UI - implement simple schema-driven dissemination UI Service: "Schema Registration" for community-specific schemas, so they can be discovered by other communities. Service: "Schema Browsing" within the DSpace "Community Schemas" collection Service: community-specific metadata submission/retrieval For example: OCW, needs SCORM, alternative contexts Art & Architecture Images, needs VRA-core Sorger / Biomedical Images, needs externally hosted content and community-specific metadata - implement schema-driven query UI, bind to RDF Query Capablity Service: community-specific query, for example "Find all DSpace Items that have VRA-Core instance metadata and where Media is Pastel" Service: extract and add citation metadata to DSpace items (ala citeseer) Service: "What's related?" Library administrators add multiple "What's related" URIs and caption text to DSpace items, using an annotea-like schema. This could be used to contextualize results in terms of related materials, disciplines, collections, interests, etc. "See also" annotation service... Service: "Submit MIT Faculty Comments on this item", "Read MIT Faculty Comments on this item" - implement cross-corpus OLAP-like extraction (e.g. histogram generation) Service: monitor high-usage metadata instances for cleanup prioritization. Service: "hot items" reported via prioritization heuristics: - multiply cataloged (multiple schemas) - frequency of use - co-citation Service: characterization of saved search trails Phase 2: -------- Authority control / type validation on ingest. OCLC metadata augmentation (authority control, classification)... at submission or at a later time Schemas for describing how items should be viewed, annotations of items in said schema, and a UI that is respectful of such annotations... Copy Cataloging - workflow and data integration from several sources? Tools to support lower-cost cataloging and metadata creation... Customized forms for community-specific metadata collection. Known community filtered annotations Phase ???: Not sure where these ideas best live... ---------- em: SCOAP library processes in machine-readable manner... viewing external collections through library lenses... (e.g. citeseer?) merging metadata from multiple sources (libraries, citeseer, etc.) karger: Faculty/Departmental Profile...(list of publications) em: RSS feed - notification services based on community of interest... _________________________________________________________________ NEW Actions (A) / Issues (I) ref owner summary ----|--------|--------- A22 bass initiate remaining contract with OSP. A23 bass update methodology to more fully reflect PI discussion of 2002-08-14. Specifically capture detail on required tasks, and resulting platform capabilities. _________________________________________________________________ OPEN Actions (A) / Issues (I) [in rough priority order] ref owner summary -------- progress / status ----|--------|--------- A22 bass initiate remaining contract with OSP. A23 bass update methodology to more fully reflect PI discussion of 2002-08-14. Specifically capture detail on required tasks, and resulting platform capabilities. I11 all, Strengthen, capture, review: driving/focusing scenarios for bass use, and relationship to methodology phases. A8 bass PI forum for Kenzie to present and all to discuss unmet metadata and information interoperability needs from MIT Libraries perspective A13 all follow through with requested paperwork to OSP ASAP after receiving it. We need contracts to close by end of July if at all possible. -------- July close appears unlikely, yet need to keep process moving remains. A21 em + Forward essential terms of SWAD Europe IPR framework, for consideration / applicability in SIMILE environment + Forward helpful one-liners wrt IPR positioning in support of / leading to a royalty-free IPR position. A16 bass confirm August 14 Cambridge with all PIs -------- works for em and karger. Mick to confirm with kenzie. A20 bass, clarify DSpace / SIMILE relationship. kenzie Need a statement / context diagram that we can post publicly. -------- Reviewed diagram with PIs. <http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Simile-Ecosystem. pdf> Need 1 page explanatory text to accompany. A12 all, Brainstorm, scope, (within available resources), and plan to deliver a simple yet compelling demonstrator by end of startup phase. A9 miller, gather & distribute to PIs RDF corpuses for kenzie, (1) DSpace History and (2) Barton Library Catalog bass -------- tar of RDF history bitstreams in hand. need bass/kenzie policy conversation. 2002-08-14 update: have MIT Libs policy, ready for handoff I7 kenzie, relationship METS :: RDF miller -------- em willing to spend energy here. mets -> closed world? motivation: functionality/capabilities of METS coupled with openness, flexibility of RDF. RDFS for METS? A1 karger summarize test case / requirements on Jena for use in haystack ... re-use cholesterol? but not hardened memory resident only - won't scale ... hope jena to get karger team out of database work ... root cause between sleepycat/Jena and cholesterol ... simple mods to existing Jena DB backends action Karger & Bass (serialize) -> profiling I2 karger should cholesterol become an alternative RDB implementation for Jena? I3 karger could/should Adenine be bundled with Jena? I6 erickson what is CNRI position regarding registration of handle system as a URI scheme with IANA? _________________________________________________________________ CLOSED Actions (A) / Issues (I) ref owner summary -------- resolution ----|--------|--------- A14 bass followup with Lissa Natkin, LCS -------- met 2002-07-17, communication established and remains open. A15 bass send www-rdf-dspace subscribe instructions to all PIs -------- sent. A19 bass consolidate and make available meeting presentation materials and other useful resources -------- here they are! also set up SIMILE project web page. A17 all provide non-available times to bass for weekly PI phone call A18 bass identify time for weekly PI phone call -------- Weekly call identified: Fridays at 2pm Eastern. May need to revisit after classes resume. A4 bass, followup Quan's "Jena in Haystack" note with Dennis karger and Jena team -------- sent to Jena team I5 kenzie, relationship DSpace Format Registry :: Mime Type open issues miller in W3C Technical Architecture Group - http://www.w3.org/2001/tag/ilist#w3cMediaType-1 - http://www.w3.org/2001/tag/ilist#customMediaType-2 - http://www.w3.org/2001/tag/ilist#nsMediaType-3 - http://www.w3.org/2001/tag/ilist#uriMediaType-9 -------- close per em, having shared info A10 karger, haystack team pairwise w/ kenzie & library staff to scope kenzie UI reqs for library catalogers / "semantic annotators" -------- Karger update: watched simple "cataloging a book" steps. lots of flipping back and forth... idea: haystack UI to present in one place all info needed to catalog a book, and allow input of reqd data. OCLC data comes back in terminal interface No RDF version of MARC xml-schema for MARC - but not flattened action -> meet reference librarians? (much harder...) _________________________________________________________________ Resources SIMILE Public Page http://web.mit.edu/dspace-dev/www/simile/ SIMILE Research Proposal (public, version 2002-04) http://web.mit.edu/dspace-dev/www/simile/resources/proposal-2002-04/index.ht m SIMILE Startup Phase Goals and Deliverables (July - October 2002) http://web.mit.edu/dspace-dev/www/simile/resources/goals-summer-2002.txt W3C Semantic Web Activity http://www.w3.org/2001/sw/ W3C Semantic Web Advanced Development Presentation Materials http://www.w3.org/2002/Talks/07-simile-swad/ Haystack Public Page http://haystack.lcs.mit.edu/ Haystack Presentation Materials http://web.mit.edu/dspace-dev/www/simile/resources/Haystack-Overview.pdf DSpace Architecture Presentation Materials http://web.mit.edu/dspace-dev/www/simile/resources/DSpace-Arch-Overview-2002 -07-11.pdf DSpace White Papers (Functionality, Architecture) http://www.dspace.org/live/implementation/design_documents/functionality.pdf http://www.dspace.org/live/implementation/design_documents/architecture.pdf Reference Model for an Open Archival Information Systems (OAIS) http://ssdoo.gsfc.nasa.gov/nost/isoas/ http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf HP Labs Semantic Web Activity http://hpl.hp.com/semweb/ Jena Semantic Web Toolkit http://www.hpl.hp.com/semweb/jena-top.html Jena Presentation Materials URL not available Jena Development Public List http://groups.yahoo.com/group/jena-dev/ www-rdf-dspace http://lists.w3.org/Archives/Public/www-rdf-dspace/ mailto://www-rdf-dspace@w3.org SCORM - Sharable Content Object Reference Model http://www.adlnet.org Open Archives Initiative - protocol for metadata harvesting and sharing http://www.openarchives.org/ METS - Metadata Encoding and Transmission Standard http://www.loc.gov/standards/mets/ RUDOLF http://www.ilrt.bris.ac.uk/discovery/rdf-dev/rudolf/ _________________________________________________________________ ============================================= Mick Bass External Engagement Manager HP Labs / MIT DSpace Program Hewlett-Packard Company Building 10-500 MIT, 77 Massachusetts Avenue Cambridge, MA 02139-4307 617.253.6617 office 617.452.3000 fax 617.899.3938 mobile 617.627.9694 residence bass@alum.mit.edu mick_bass@hp.com =============================================
Received on Thursday, 22 August 2002 15:35:32 UTC