- From: BASS,MICK (HP-USA,ex1) <mick_bass@hp.com>
- Date: Fri, 19 Jul 2002 00:36:08 -0400 (EDT)
- To: www-rdf-dspace@w3.org
Attached are meeting notes from the SIMILE PI Meeting and Technical Exchange of Thursday 11 July 2002. I have also set up a SIMILE Project web page at http://www.mit.edu/dspace-dev/simile/ and consolidated project resources to date (including these meeting notes). Please let me know if I've misrepresented anything, and be sure to check out the actions and issues with your name next to them. Best Regards, - Mick _________________________________________________________________ SIMILE PI Meeting and Technical Exchange 2002-07-11 Summary, Actions, and Issues Editor: Mick Bass, HP mailto:bass@alum.mit.edu This document: http://www.mit.edu/dspace-dev/simile/minutes/minutes-2002-07-11.txt Series: http://www.mit.edu/dspace-dev/simile/minutes/index.html _________________________________________________________________ Table of Contents Agenda Attendees Resources Summary New Actions / Issues Open Actions / Issues Closed Actions / Issues _________________________________________________________________ Agenda (as revised) Welcome, Goals for the Day and for the Summer W3C Advanced Development Q&A, Discussion Jena: brief overview, current & future directions Haystack: brief overview Q&A, discussion, brainstorm wrt jena, haystack (videoconf & netmeeting from Bristol, UK) Haystack Demo Lunch DSpace: architecture, target research platform Discussion review proposed methodology review deliverables in startup phase working process for remainder of startup phase use case / demonstrator hone & brainstorm _________________________________________________________________ Attendees David Karger (PI) MIT, LCS / AI Lab David Huynh MIT, LCS / AI Lab Dennis Quan MIT, LCS / AI Lab Vineet Sinha MIT, LCS / AI Lab MacKenzie Smith (Co-I) MIT, Libraries Eric Miller (Co-I) W3C Marja-Riitta Koivunen W3C Mick Bass HP Labs Robert Tansley HP Labs John Erickson HP Labs (via video) Nick Wainwright HP Labs Rivka Ladin HP Labs Partial-day attendees Eytan Adar HP Labs David Stuve HP Labs Brian McBride HP Labs (via video) Dave Reynolds HP Labs (via video) Andy Seaborne HP Labs (via video) _________________________________________________________________ Resources SIMILE Public Page http://www.mit.edu/dspace-dev/www/simile/ SIMILE Research Proposal (public, version 2002-04) http://www.mit.edu/dspace-dev/www/simile/resources/proposal-2002-04/index.ht m SIMILE Startup Phase Goals and Deliverables (July - October 2002) http://www.mit.edu/dspace-dev/www/simile/resources/goals-summer-2002.txt W3C Semantic Web Activity http://www.w3.org/2001/sw/ W3C Semantic Web Advanced Development Presentation Materials http://www.w3.org/2002/Talks/07-simile-swad/ Haystack Public Page http://haystack.lcs.mit.edu/ Haystack Presentation Materials http://www.mit.edu/dspace-dev/www/simile/resources/Haystack-Overview.pdf DSpace Architecture Presentation Materials http://www.mit.edu/dspace-dev/www/simile/resources/DSpace-Arch-Overview-2002 -07-11.pdf DSpace White Papers (Functionality, Architecture) http://www.dspace.org/live/implementation/design_documents/functionality.pdf http://www.dspace.org/live/implementation/design_documents/architecture.pdf Reference Model for an Open Archival Information Systems (OAIS) http://ssdoo.gsfc.nasa.gov/nost/isoas/ http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf HP Labs Semantic Web Activity http://hpl.hp.com/semweb/ Jena Semantic Web Toolkit http://www.hpl.hp.com/semweb/jena-top.html Jena Presentation Materials URL not available Jena Development Public List http://groups.yahoo.com/group/jena-dev/ www-rdf-dspace http://lists.w3.org/Archives/Public/www-rdf-dspace/ mailto://www-rdf-dspace@w3.org _________________________________________________________________ Summary (summarizes both the meeting and some subsequent pairwise interactions) W3C Advanced Development Eric Miller presented an overview of W3C Semantic Web advanced development activities. Marja-Riitta Koivunen described current work on Annotea. Jena, Haystack, Persistent RDF Stores Brian McBride, Dave Reynolds, and Andy Seaborne joined as Brian McBride presented a brief overview of the Jena semantic web toolkit. David Karger followed with a brief overview of Haystack. Open discussion regarding potential haystack/jena interactions, and especially focused on database back-ends.(->A1) The discussion included: - a discussion of Cholesterol (current RDF store used in haystack) and adenine (scripting language that produces RDF). (->I2, I3) - Haystack's need to issue many (hundreds per second) small, simple queries to be able to render its UI - reification, the haystack "belief server", and overheads associated with storing reified rdf statements - event triggers Dave Reynolds subsequently constructed a simple tree-walking testcase that demonstrates query throughput in the "hundreds per second" range on sub-$1K PCs. See Dave's note on www-rdf-dspace. http://lists.w3.org/Archives/Public/www-rdf-dspace/2002Jul/0002.html Dennis Quan has also since undertaken the effort to create a version of Haystack that uses Jena instead of Cholesteral, to shed light on any hidden requirements. This has uncovered some additional issues, which Dennis describes in a note on www-rdf-dspace. http://lists.w3.org/Archives/Public/www-rdf-dspace/2002Jul/0004.html (->A4) DSpace Architecture Rob Tansley presented an overview of the DSpace Architecture. Eric Miller pointed out overlap between the DSpace bitstream format registry and open issues regarding mime-types in the W3C Technical Architecture Group. (->I5) Will Handles ever be URIs? They are URI-compliant, but what are CNRI's plans with respect to registration? (->I6) Can there be / should there be an RDF Schema for METS? (->I7) DSpace currently uses Jena to create RDF serializations of instances in the DSpace data model and descriptions of relationships among these serializations and events that change the state of the archive. DSpace does not yet use Jena as a persistent RDF store. Many opportunities exist to make the existing RDF information more readily query-able, as well as to store community- and format-specific metadata in a persistent RDF store in addition to storing it in serialized bitstreams, as is currently done. We revisited the desire and need for the output and progress from SIMILE to respond to strong demand from MIT Libraries, the library community, and the effort to federate DSpace at a number of leading universities. Dave Karger requested a followup session where MacKenzie could share her perspective on unmet needs in in the library community (->A8) We clarified the difference between DSpace RDF History data (which describes instances of the DSpace data model, and the temporal relationships between them and key events that change the state of the archive), and RDF transformations of MARC from Barton, the libraries catalog. Dave Karger and his team expressed interest in immediately beginning to develop some simple Haystack UI components for exploring and navigating both of these corpuses of RDF metadata. Eric Miller agreed to assist with transformations (if any required) on the DSpace RDF instance serializations and history metadata. (->A9, A10) Dave Karger envisions 3 possible variants of haystack clients: - generic. This is haystack as it exists today library client. - library client. This is a personal haystack extended to provide richer functionality for interoperating with library resources. Or a library-user haystack able to interoperate with personal resources, depending on your perspective :) - library administrator. This is a haystack optimized for those that maintain information corpus, and take decisions about how it can and should be organized and annotated to provide optimal value to the designated community. (->A10) Methodology Walkthrough We reviewed the three phases that we've initially scoped for the project: 1 + Support submit/query/discover/retrieve of instances of simple flat (e.g. name-value pairs) community-specific schemas. + Support simple (free ascii text, human-readable) annotations of DSpace items. + Integrate Jena, persistent RDF store with DSpace. + Add consequent UI support. 2 + Add support for machine-readable annotations to any instance in the DSpace data model (including annotations to individual metadata elements). This will enable authority-control annotation-services layered atop the originally-submitted metadata. + Add support for discovery and incorporation of distributed metadata stores (some metadata hosted by libary, some metadata hosted by community or department, some personal metadata). + Add required consequent dissemination/viewing architecture. 3 + Add support for rich schemas (hierarchy, whole/part, etc.) + schema mapping & intelligent query response through aggregation of distributed schema annotations. A key issue is that the relationship between these phases, and the proposed additional platform capabilities to be demonstrated in each, and unmet scenarios for use in the proposed pilot domain (libraries) is not yet sufficiently well articulated and vetted. (->I11) These phases are not yet frozen. The PIs retain the flexibility to change their definitions, and the relationship of each to articulated demonstrators and scenarios for use. We agreed to be explicit about the difference between: - service-deployment policy decisions taken in a particular environment, and - required platform capabilities recognizing that platform capabilities may need to be be more rich than the specific (policy-driven) requirements in any one environment, in order to support the different and various service-deployment decisions that will be taken elsewhere. Example: Would libraries host metadata for resources that they do not host? (In the current DSpace service offered by MIT Libraries, the answer is "no"). Should the SIMILE platform be capable of distributed metadata such that an institution can decide to host metadata for resources that they do not host? (The answer is very arguably "yes". For instance, much annotation metadata will relate multiple resources, rather than describe one or the other). We discovered that the term "annotation" is a victim of semantic overload. W3C SWAD, and specifically the Annotea team, uses it in its most general form (a machine-readable statement that describes a relationship between two resources). In phase two of the methodology we use it to mean additional, or perhaps corrected, machine-readable metadata for DSpace items (examples: suggested subject keywords for a previously-submitted item; a corrected or otherwise improved author (authority-controlled, perhaps) annotating the author as originally submitted. In phase one we mean it in an even simpler and less granular sense: human-readable ascii prose annotating items submitted in DSpace. This is another example platform capabilies vs. service deployment choices. We'll need to architect from the beginning with the most general case in mind. But we'll want to turn on with progressively more complex deployed services, starting with the simplest ones. Deliverables in the Startup Phase We reviewed the goals and deliverables for the startup phase of the project. Nick Wainwright suggested that we search for a way to exit the startup phase with a simple but solidly grounded demo/example in addition to the deliverables that we reviewed (project plan, etc.) (->A12) We reviewed the current status of contracts with Office of Sponsored Programs, finance contacts in LCS & Libraries, and the need to turn around contracts as quickly as possible. (->A13, A14) Preferred working processes We discussed our preferences regarding how to work together to efficiently get the work done. Outcomes included: - Need an email list for group communication. We decided to use www-rdf-dspace, an existing list at W3C that we set up during the DSpace project for discussion about application of RDF within DSpace. (->A15) - We agreed that we will need 3-4 face to face working meetings including all PIs, somewhere between 1/2 day to a full day each. That's about once every three to four weeks. Eric Miller stated a preference to do at least one of these in Ohio. David Karger voiced constraints on ability to travel, and prefers Ohio later rather than earlier during the startup phase. We tentatively agreed to August 14th for the next PI face to face, in Cambridge. Need to confirm date & location. (->A16) - In addition to the PI f2f meetings, we acknowledged that we'd likely need ongoing offline pairwise interactions. Results from these will ideally be written up and submitted to the group via www-rdf-dspace. - We agreed to steer on an ongoing basis through a weekly PI phone call. The purpose of the weekly call is to tee up required offline interactions, take group decisions, and report on open issues/actions. We tentatively agreed to Thursdays, 3pm for this call. That time has since broken, we will need to find a different time. (->A17, A18) Several participants requested that we consolidate materials and resources from the meeting and make them available to the group. (->A19) Hone & Brainstorm Demonstrators and Scenarios for Use We didn't make significant progress here, its top-of-the list for future work. _________________________________________________________________ NEW Actions (A) / Issues (I) ref owner summary ----|--------|--------- A1 karger summarize test case / requirements on Jena for use in haystack I2 karger should cholesterol become an alternative RDB implementation for Jena? I3 karger could/should Adenine be bundled with Jena? A4 bass, followup Quan's "Jena in Haystack" note with Dennis karger and Jena team. I5 kenzie, relationship DSpace Format Registry :: Mime Type open issues miller in W3C Technical Architecture Group - http://www.w3.org/2001/tag/ilist#w3cMediaType-1 - http://www.w3.org/2001/tag/ilist#customMediaType-2 - http://www.w3.org/2001/tag/ilist#nsMediaType-3 - http://www.w3.org/2001/tag/ilist#uriMediaType-9 I6 erickson what is CNRI position regarding registration of handle system as a URI scheme with IANA? I7 kenzie, relationship METS :: RDF miller A8 bass PI forum for Kenzie to present and all to discuss unmet metadata and information interoperability needs from MIT Libraries perspective A9 miller, gather & distribute RDF corpuses for (1) DSpace History and kenzie, (2) Barton Library Catalog bass A10 karger, haystack team pairwise w/ kenzie & library staff to scope UI reqs for kenzie library catalogers / "semantic annotators" I11 all, Strengthen, capture, review: Driving/focusing scenarios for bass use, and relationship to methodology phases. A12 all, Brainstorm, scope, (within available resources), and plan to deliver a simple yet compelling demonstrator by end of startup phase. A13 all follow through with requested paperwork to OSP ASAP after receiving it. We need contracts to close by end of July if at all possible. A14 bass followup with Lissa Natkin, LCS A15 bass send www-rdf-dspace subscribe instructions to all PIs A16 bass confirm August 14 Cambridge with all PIs A17 all provide non-available times to bass for weekly PI phone call A18 bass identify time for weekly PI phone call A19 bass consolidate and make available meeting presentation materials and other useful resources _________________________________________________________________ OPEN Actions (A) / Issues (I) ref owner summary -------- progress / status ----|--------|--------- A1 karger summarize test case / requirements on Jena for use in haystack I2 karger should cholesterol become an alternative RDB implementation for Jena? I3 karger could/should Adenine be bundled with Jena? A4 bass, followup Quan's "Jena in Haystack" note with Dennis karger and Jena team -------- sent to Jena team I5 kenzie, relationship DSpace Format Registry :: Mime Type open issues miller in W3C Technical Architecture Group - http://www.w3.org/2001/tag/ilist#w3cMediaType-1 - http://www.w3.org/2001/tag/ilist#customMediaType-2 - http://www.w3.org/2001/tag/ilist#nsMediaType-3 - http://www.w3.org/2001/tag/ilist#uriMediaType-9 I6 erickson what is CNRI position regarding registration of handle system as a URI scheme with IANA? I7 kenzie, relationship METS :: RDF miller A8 bass PI forum for Kenzie to present and all to discuss unmet metadata and information interoperability needs from MIT Libraries perspective A9 miller, gather & distribute to PIs RDF corpuses for kenzie, (1) DSpace History and (2) Barton Library Catalog bass -------- tar of RDF history bitstreams in hand. need bass/kenzie policy conversation. A10 karger, haystack team pairwise w/ kenzie & library staff to scope UI reqs for kenzie library catalogers / "semantic annotators" I11 all, Strengthen, capture, review: driving/focusing scenarios for bass use, and relationship to methodology phases. A12 all, Brainstorm, scope, (within available resources), and plan to deliver a simple yet compelling demonstrator by end of startup phase. A13 all follow through with requested paperwork to OSP ASAP after receiving it. We need contracts to close by end of July if at all possible. A16 bass confirm August 14 Cambridge with all PIs A17 all provide non-available times to bass for weekly PI phone call A18 bass identify time for weekly PI phone call _________________________________________________________________ CLOSED Actions (A) / Issues (I) ref owner summary -------- resolution ----|--------|--------- A14 bass followup with Lissa Natkin, LCS -------- met 2002-07-17, communication established and remains open. A15 bass send www-rdf-dspace subscribe instructions to all PIs -------- sent. A19 bass consolidate and make available meeting presentation materials and other useful resources -------- here they are! also set up SIMILE project web page. _________________________________________________________________ ============================================= Mick Bass External Engagement Manager HP Labs / MIT DSpace Program Hewlett-Packard Company Building 10-500 MIT, 77 Massachusetts Avenue Cambridge, MA 02139-4307 617.253.6617 office 617.452.3000 fax 617.899.3938 mobile 617.627.9694 residence bass@alum.mit.edu mick_bass@hp.com =============================================
Received on Friday, 19 July 2002 13:11:09 UTC