FW: SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST

Forwarded to the list - Martin's original post for some reason didn't make it to the archive.

-----Original Message-----
From: Merry, Martin 
Sent: Thursday, April 15, 2004 8:07 AM
To: Bass, Mick; Mackenzie Smith; Ryan Lee; www-rdf-dspace@w3.org
Subject: RE: SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST


Dear All

Let me briefly try and clarify _my_ concerns about scale (Nick may have other issues).

Scalability covers a multitude of sins: in particular there is scalability wrt instance data, and scalability wrt ontologies. There is also the issue of whether datasets are linked dynamically.

Thus:

1) Small number of large corpora; no dynamic linking.

Here one is putting a query to a known, small collection of datasets. LInks between the different ontologies/schemata are known in advance; hence one is essentially querying a single large collection of triples. Lots of precomputation can be done.  Issues are to do with working out what precomuptation is necessary, but provided the precomputation can be done offline, probably gives you something tractable, and enables some sort of faceted broswer-based approach to work.

2) Large number of large corpora; no dynamic linking.

I''ve not thought of this as a dimension that Simile would put a lot of effort into in terms of building prototypes, simply because it's unlikely that there's be enough data out there. In this case tho the amount of precomputation necessary may become prohibitive.

3) Small number of large corpora; dynamic linking

This is where one - as part of the querying /browsing process - assembles a collection of corpora and says "I want to browse these" - identifying links between them on the fly.

It's not obvious to me that precomputation is possible in this case: it's certainly a lot harder. If you can't do any precomputation run-time performance is likely to be extremely problematic. Faceted browsing may well not be the way to go.


4) Large number of large corpora; dynamic linking 

Yuk.

-------------------

I was initially asking for clarification of which of these situations the project is in i.e. when the scalability report comes out, can it be grounded in the use cases that the project is going to be addressing, so we know which of these issues we're facing. Once this has been done I was also advocating a collection of milestones to address the issues the report raised, so that there is some visible progress on this prior to the "demonstration at scale" milestone in June 05.  In particular, I felt it would be a Good Thing to have some sort of proof of concept of scalability before launching into a prototype integrating Simile with DSpace.

I hope all this makes sense - please yell if it doesn't.

As I said at the beginning, these were my concerns, rather than Nick's - NIck can add any clarification he feels is neccessary. They're also my concerns rather than the Jena team's, which I believe were captured during the discussions.

Martin

> -----Original Message-----
> From: www-rdf-dspace-request@w3.org 
> [mailto:www-rdf-dspace-request@w3.org]On Behalf Of Bass, Mick
> Sent: 15 April 2004 14:24
> To: Bass, Mick; Mackenzie Smith; Ryan Lee; www-rdf-dspace@w3.org
> Subject: RE: SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST
> 
> 
> 
> Feedback on the milestones doc from Martin and Nick:
> 
> 1. There is a full year between the paper describing issues
> of scaling up and the milestone ("Navigation and mapping 
> demonstration at increased scale" - milestone 4, June 05) 
> where scale is finally demonstrated.  Suggest identifying 
> useful and measurable intermediate milestones to be included 
> with the scale whitepaper deliverable. 
> 
> 2. Concern that the work regarding scale remain consistent
> with the functional integration overall architecture to be 
> defined in "Proposal for functional integration of SIMILE 
> with DSpace" deliverable (milestone 3, December 04) and 
> "Prototype of SIMILE working with DSpace - Architecture and 
> implementation that incorporates tools developed for 
> Milestones 1-3." (milestone 4, June 05).  Desire to avoid 
> forks in the effort that join late or not at all.  Question 
> was - "how can the milestones reflect the need for concerns 
> regarding scale to be reflected in the milestones regarding 
> SIMILE architecture and functional integration of SIMILE with DSpace?"
> 
> 3. Observation and concern that some navigate/browse
> paradigms could be fundamentally computationally intensive 
> and intractable with extremely large and dynamically changing 
> datasets.  So architecture needs to consider both concerns of 
> scale from the perspective of database and query capacity, as 
> well as appropriate constraints on the interaction paradigm 
> so that implementation is tenable regardless of the 
> underlying DB technologies.  That is, there are two sets of 
> concerns: 1 - scaling Jena and RDQL so that client browsers 
> can implement useful browse/navigate paradigms and 2 - 
> designing useful browse/navigate/search paradigms including 
> constraints (on support for dynamic data, for the types of 
> queries that need to be issued) that can be implemented using 
> available database and query technologies (of which Jena is 
> one alternative).
> 
> 4. Martin and the Jena team would like further guidance with
> respect to the interactions and support that SIMILE will 
> require in addressing the issues of scale and/or client 
> interface design described in the above 3 items.
> 
> 
> ====
> Mick Bass
> 
> 970.898.6788 office    408.216.0584 fax
> 303.667.1227 mobile    303.494.5202 residence
> bass@alum.mit.edu      mick_bass@hp.com
> ====
> 
> 
> > -----Original Message-----
> > From: www-rdf-dspace-request@w3.org
> > [mailto:www-rdf-dspace-request@w3.org] On Behalf Of Bass, Mick
> > Sent: Thursday, April 15, 2004 12:35 AM
> > To: Mackenzie Smith; Ryan Lee
> > Cc: www-rdf-dspace@w3.org
> > Subject: RE: SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST
> > 
> > 
> > 
> > Boo, it is late breaking but I will be on a plane to LA
> > tomorrow at 11a EDT and so must send my regrets as well.
> > 
> > I'd like to see the milestones converge - I have some
> > feedback from Martin Merry and Nick Wainwright on the 
> > milestones that I will send in a separate note before the call.
> > 
> > - Mick
> > 
> > 
> > > -----Original Message-----
> > > From: www-rdf-dspace-request@w3.org 
> > > [mailto:www-rdf-dspace-request@w3.org] On Behalf Of
> Mackenzie Smith
> > > Sent: Wednesday, April 14, 2004 9:03 PM
> > > To: Ryan Lee
> > > Cc: www-rdf-dspace@w3.org
> > > Subject: Re: SIMILE PI phone conference, 15-Apr-04 1100
> EDT/1600 BST
> > > 
> > > 
> > > 
> > > I too must send regrets for tomorrow -- I'll be on a plane to DC 
> > > at the appointed hour. Stefano knows what's happening, and we've 
> > > got a contact at Archnet to ask about their data. I do need to 
> > > know what's happening with the milestones document pretty soon, so 
> > > I asked Stefano to make sure that gets discussed and let me know 
> > > when I'm back.
> > > 
> > > Thanks,
> > > 
> > > MacKenzie
> > > 
> > > 
> > > Quoting Ryan Lee <ryanlee@w3.org>:
> > > 
> > > > 
> > > > SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST
> > > > 
> > > > +1.617.761.6200, code 7464 ("SIMI")
> > > > irc://irc.w3.org:6665/simile
> > > > 
> > > > You may want to look at the W3C Teleconferencing IRC
> Agent (Zakim)
> > > > page
> > > > for useful intructions:
> > > >    http://www.w3.org/2001/12/zakim-irc-bot
> > > > 
> > > > Agenda
> > > > 
> > > > 1. WWW2004 attendance (Eric)
> > > > 
> > > > 2. Additional datasets (MacKenzie, Eric)?
> > > > 
> > > > 3. Milestones document (MacKenzie)?
> > > > 
> > > > 4. Infrastructure status (Stefano)
> > > > 
> > > > 5. Development status (Ryan for Mark)
> > > > 
> > > > Regrets: Mark
> > > > 
> > > > Agenda at http://simile.mit.edu/wiki/MeetingAgenda15April2004
> > > > 
> > > > Please edit and add on the wiki.
> > > > 
> > > > -- 
> > > > Ryan Lee                 ryanlee@w3.org
> > > > W3C Research Engineer    +1.617.253.5327
> > > > http://web.mit.edu/simile/www/
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > 
> 

Received on Friday, 16 April 2004 16:35:08 UTC