Re: Pre-reading for 05/24 RDFI phone conference from Art Barstow on 2001-05-24 (www-rdf-dspace@w3.org from May 2001)

From: Art Barstow <barstow@w3.org>
Date: Thu, 24 May 2001 08:41:58 -0400
To: pbreton@mit.edu
Cc: www-rdf-dspace@w3.org, nick_wainwright@hp.com, libby.miller@bristol.ac.uk
Message-ID: <20010524084158.C16595@w3.org>
Peter,

Since it appears that you want to initially focus on input
and storage, you may also want to consider how DSpace
intends to support querying the data stored in the triple
store since that may affect your implementation decsion on 
storage.

Along those lines, I wanted to mention some work that
Libby Miller at ILRT has done with RDF query:

 http://swordfish.rdfweb.org/rdfquery/

Art
---

On Wed, May 23, 2001 at 09:16:04PM -0400, Mick Bass wrote:
> I asked Peter Breton to prepare some background for our phone conference 
> tomorrow so that we are not starting with a blank page, and so you can 
> synchronize with our current thinking and understand what we've already 
> investigated.  I've included Peter's set of "grounding statements" 
> below.  Please review them in advance of the phone conference.
> 
> This makes our use of time as follows:
> 
> I.	Brief review of Goal and motivation for achieving The Goal.
> II. 	Review approaches 1a, 1b, 2 below.  Consensus on 1a?
> III.	Approach strategy discussion (as required)
> IV.	Implementation discussion (for selected approach)
> V.	Automation strategies for RDF generation
> 
> We have only an hour, so I will take the liberty of moderating ruthlessly.
> 
> I'd like to thank Peter for this prep material and each of you for 
> contributing to the discussion.
> 
> Once again, logistics:
> 
> 7:30 - 8:30 am US/Pacific, 10:30 - 11:30 US/Eastern , 3:30 - 4:30 GMT
> Meet-Me: 404-774-4109  or TN 774-4109
> Meeting ID: 6116
> 
> IRC Server / Channel: irc.openprojects.net / #dspace
> 
> - Mick
> 
> ==============================================================
> 
> Date: Wed, 23 May 2001 19:15:41 -0400
> From: Peter Breton <pbreton@MIT.EDU>
> Organization: MIT
> User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.14-12 i686; en-US; rv:0.9) 
> Gecko/20010505
> X-Accept-Language: en, de, en-gb, fr-ch, sl
> To: Mick Bass <bass@mit.edu>
> Subject: Grounding statements for phone conference tomorrow
> 
> 
> The Platform
> ==========
> DSpace is currently implemented with Java, JSP/Servlets and a Postgres 
> backend. Approaches and toolkits which are outside this domain will 
> probably be mainly useful for architecture and concept mining.
> 
> 
> The Goal
> =======
> Implement a (probably persistent RDBMS-backed) triple-store for DSpace 
> data. The triple store must be outside the critical path; it should be 
> possible to use DSpace without the triple store without any significant 
> loss of functionality.
> 
> 
> Approaches (this is not an exhaustive list):
> ===============================
> 
> 
> 1). Add-on triple store
> 
> 
> a. Via Java code. In this scenario, when DSpace obtains new data (eg, a new 
> Publication is submitted), it makes this information available to an RDF 
> storage module.  The storage module somehow (?) creates RDF triples and 
> persists them (in addition to any other storage for these objects).
> I expect that most significant actions in the system will cause 
> corresponding (Java Bean) Events to be fired, with in-memory 
> representations of the objects; so actually connecting the RDF storage 
> module with the data to be persisted should be straightforward.
> 
> 
> b. Via database triggers. In this RDBMS-centric approach, whenever a row is 
> inserted or modified in the RDBMS, a trigger fires which creates 
> corresponding triples in some other tables. The triple tables and the 
> ordinary mainstream tables are otherwise completely separate. 
> Implementation here would focus on creating the triggers.
> 
> 
> While approach b) has some nice features (transparent operation and 
> automatic synchronization, to name two), our current feeling is that it is 
> too limiting, both to fit all data that can be RDFed into the Procrustean 
> bed of relational tables and to write the synchronization logic in an RDBMS 
> language.
> 
> 
> 2) Virtual triple store
> 
> 
> Similar to 1), except that the redundant triple store is not actually 
> physically created; instead, we only create logic that can transform our 
> data into RDF triples (for export and the like). This essentially pushes 
> the problem over to someone else, who can use the RDF data to create their 
> own triple store if they like.
> 
> 
> Note that while there is a lot of overlap with 1), the virtual triple store 
> problem is much easier, since it need not consider persistence; it only 
> needs to do half of the job.
> 
> 
> Approaches II
> ==========
> I currently favor approach 1a), basically because:
> 
> 
> * I think a modern programming language is the right place to put 
> potentially complex mapping logic
> * I think DSpace will get farther with a real, honest-to-gosh triple store 
> than just a strategy for generating one
> 
> 
> I would suggest that if there is general consensus on this approach, that 
> we move directly to a discussion of how to implement such a beast. 
> Otherwise, we should have a discussion of which approach to pursue.
> 
> 
> Automating of RDF generation
> =======================
> I would also like to discuss ways to minimize the effort necessary to 
> capture triples.
> 
> 
> Two promising areas are:
> 
> 
> 1. Creation of triples from RDBMS column and foreign key relationships
> 2. Creation of triples from Java Beans
> 
> 
> I think the second of these would work something like this: let's say that 
> our hypothetical RDF storage module gets a PublicationSubmittedEvent. The 
> event contains, as data fields, the new Publication, the User who submitted 
> it, the time the event occurred, and the User's session.
> 
> 
> The storage module could create a number of relationships based on this 
> information -- for example, the language of said User was German; the User 
> logged in from www.bluewin.de; the Publication has Author "Hans Gretel", 
> and so forth. Some of these relationships would need to be characterized: 
> for example, it's not clear how the user and publication are related, so we 
> might add an annotation that says that, for PublicationSubmittedEvents, the 
> relationship between the two is "submittedBy". However, it would be nice if 
> most of this information could be directly derived from the object; when a 
> Publication has a "title" property (using Java Bean speak), the system can 
> automatically construct: publication with id 141 has title "Harry Potter 
> and the Sorceror's Stone".  Basically, as much as possible you indicate 
> your intentions by simply DOING (in this case, creating an object with a 
> title property); but if for some reason you must do one thing but mean 
> something different, you add an annotation which expresses your real meaning.
> 
> 
> Toolkits we've looked at (in various degrees of depth):
> =======================================
> 
> 
> * Brian McBride's Jena (and various Jena extensions) -- nice code, Brian!
> * Eric P's RDF db store (overview only -- haven't poked around in the code)
> * OCLC/Dublin Core's EOR (actually, I just started reading the docs :)
> * Survey of storing RDF in an RDBMS
> * RDF SiRPAC API
> 
> 
> Thanks again for lending your time,
> 
> 
> Peter
> =============================================
> Mick Bass, Sloan MOT 2000
> 
> R&D Project Manager, Hewlett-Packard Company
> Building 10-500 MIT, 77 Massachusetts Avenue
> Cambridge, MA 02139-4307
> 
> 617.253.6617 office    617.452.3000 fax
> 617.899.3938 mobile    617.627.9694 residence
> bass@alum.mit.edu      mick_bass@hp.com
> =============================================
Received on Thursday, 24 May 2001 08:43:14 UTC