RE: SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST

Nicely put, Martin - thanks for clarifying my attempt to summarize your _real_ thoughts.

Looks like Steve Garland has taken it up to work on this this week and present a proposal on next week's call.  Is that right Steve?

- Mick

> -----Original Message-----
> From: Merry, Martin 
> Sent: Thursday, April 15, 2004 8:07 AM
> To: Bass, Mick; Mackenzie Smith; Ryan Lee; www-rdf-dspace@w3.org
> Subject: RE: SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST
> 
> 
> Dear All
> 
> Let me briefly try and clarify _my_ concerns about scale 
> (Nick may have other issues).
> 
> Scalability covers a multitude of sins: in particular there 
> is scalability wrt instance data, and scalability wrt 
> ontologies. There is also the issue of whether datasets are 
> linked dynamically.
> 
> Thus:
> 
> 1) Small number of large corpora; no dynamic linking.
> 
> Here one is putting a query to a known, small collection of 
> datasets. LInks between the different ontologies/schemata are 
> known in advance; hence one is essentially querying a single 
> large collection of triples. Lots of precomputation can be 
> done.  Issues are to do with working out what precomuptation 
> is necessary, but provided the precomputation can be done 
> offline, probably gives you something tractable, and enables 
> some sort of faceted broswer-based approach to work.
> 
> 2) Large number of large corpora; no dynamic linking.
> 
> I''ve not thought of this as a dimension that Simile would 
> put a lot of effort into in terms of building prototypes, 
> simply because it's unlikely that there's be enough data out 
> there. In this case tho the amount of precomputation 
> necessary may become prohibitive.
> 
> 3) Small number of large corpora; dynamic linking
> 
> This is where one - as part of the querying /browsing process 
> - assembles a collection of corpora and says "I want to 
> browse these" - identifying links between them on the fly.
> 
> It's not obvious to me that precomputation is possible in 
> this case: it's certainly a lot harder. If you can't do any 
> precomputation run-time performance is likely to be extremely 
> problematic. Faceted browsing may well not be the way to go.
> 
> 
> 4) Large number of large corpora; dynamic linking 
> 
> Yuk.
> 
> -------------------
> 
> I was initially asking for clarification of which of these 
> situations the project is in i.e. when the scalability report 
> comes out, can it be grounded in the use cases that the 
> project is going to be addressing, so we know which of these 
> issues we're facing. Once this has been done I was also 
> advocating a collection of milestones to address the issues 
> the report raised, so that there is some visible progress on 
> this prior to the "demonstration at scale" milestone in June 
> 05.  In particular, I felt it would be a Good Thing to have 
> some sort of proof of concept of scalability before launching 
> into a prototype integrating Simile with DSpace.
> 
> I hope all this makes sense - please yell if it doesn't.
> 
> As I said at the beginning, these were my concerns, rather 
> than Nick's - NIck can add any clarification he feels is 
> neccessary. They're also my concerns rather than the Jena 
> team's, which I believe were captured during the discussions.
> 
> Martin
> 
> > -----Original Message-----
> > From: www-rdf-dspace-request@w3.org 
> > [mailto:www-rdf-dspace-request@w3.org]On Behalf Of Bass, Mick
> > Sent: 15 April 2004 14:24
> > To: Bass, Mick; Mackenzie Smith; Ryan Lee; www-rdf-dspace@w3.org
> > Subject: RE: SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST
> > 
> > 
> > 
> > Feedback on the milestones doc from Martin and Nick:
> > 
> > 1. There is a full year between the paper describing issues
> > of scaling up and the milestone ("Navigation and mapping 
> > demonstration at increased scale" - milestone 4, June 05) 
> > where scale is finally demonstrated.  Suggest identifying 
> > useful and measurable intermediate milestones to be included 
> > with the scale whitepaper deliverable. 
> > 
> > 2. Concern that the work regarding scale remain consistent
> > with the functional integration overall architecture to be 
> > defined in "Proposal for functional integration of SIMILE 
> > with DSpace" deliverable (milestone 3, December 04) and 
> > "Prototype of SIMILE working with DSpace - Architecture and 
> > implementation that incorporates tools developed for 
> > Milestones 1-3." (milestone 4, June 05).  Desire to avoid 
> > forks in the effort that join late or not at all.  Question 
> > was - "how can the milestones reflect the need for concerns 
> > regarding scale to be reflected in the milestones regarding 
> > SIMILE architecture and functional integration of SIMILE 
> with DSpace?"
> > 
> > 3. Observation and concern that some navigate/browse
> > paradigms could be fundamentally computationally intensive 
> > and intractable with extremely large and dynamically changing 
> > datasets.  So architecture needs to consider both concerns of 
> > scale from the perspective of database and query capacity, as 
> > well as appropriate constraints on the interaction paradigm 
> > so that implementation is tenable regardless of the 
> > underlying DB technologies.  That is, there are two sets of 
> > concerns: 1 - scaling Jena and RDQL so that client browsers 
> > can implement useful browse/navigate paradigms and 2 - 
> > designing useful browse/navigate/search paradigms including 
> > constraints (on support for dynamic data, for the types of 
> > queries that need to be issued) that can be implemented using 
> > available database and query technologies (of which Jena is 
> > one alternative).
> > 
> > 4. Martin and the Jena team would like further guidance with
> > respect to the interactions and support that SIMILE will 
> > require in addressing the issues of scale and/or client 
> > interface design described in the above 3 items.
> > 
> > 
> > ====
> > Mick Bass
> > 
> > 970.898.6788 office    408.216.0584 fax
> > 303.667.1227 mobile    303.494.5202 residence
> > bass@alum.mit.edu      mick_bass@hp.com
> > ====
> > 
> > 
> > > -----Original Message-----
> > > From: www-rdf-dspace-request@w3.org
> > > [mailto:www-rdf-dspace-request@w3.org] On Behalf Of Bass, Mick
> > > Sent: Thursday, April 15, 2004 12:35 AM
> > > To: Mackenzie Smith; Ryan Lee
> > > Cc: www-rdf-dspace@w3.org
> > > Subject: RE: SIMILE PI phone conference, 15-Apr-04 1100 
> EDT/1600 BST
> > > 
> > > 
> > > 
> > > Boo, it is late breaking but I will be on a plane to LA
> > > tomorrow at 11a EDT and so must send my regrets as well.
> > > 
> > > I'd like to see the milestones converge - I have some
> > > feedback from Martin Merry and Nick Wainwright on the 
> > > milestones that I will send in a separate note before the call.
> > > 
> > > - Mick
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: www-rdf-dspace-request@w3.org 
> > > > [mailto:www-rdf-dspace-request@w3.org] On Behalf Of
> > Mackenzie Smith
> > > > Sent: Wednesday, April 14, 2004 9:03 PM
> > > > To: Ryan Lee
> > > > Cc: www-rdf-dspace@w3.org
> > > > Subject: Re: SIMILE PI phone conference, 15-Apr-04 1100
> > EDT/1600 BST
> > > > 
> > > > 
> > > > 
> > > > I too must send regrets for tomorrow -- I'll be on a 
> plane to DC 
> > > > at the appointed hour. Stefano knows what's happening, 
> and we've 
> > > > got a contact at Archnet to ask about their data. I do need to 
> > > > know what's happening with the milestones document 
> pretty soon, so 
> > > > I asked Stefano to make sure that gets discussed and 
> let me know 
> > > > when I'm back.
> > > > 
> > > > Thanks,
> > > > 
> > > > MacKenzie
> > > > 
> > > > 
> > > > Quoting Ryan Lee <ryanlee@w3.org>:
> > > > 
> > > > > 
> > > > > SIMILE PI phone conference, 15-Apr-04 1100 EDT/1600 BST
> > > > > 
> > > > > +1.617.761.6200, code 7464 ("SIMI")
> > > > > irc://irc.w3.org:6665/simile
> > > > > 
> > > > > You may want to look at the W3C Teleconferencing IRC
> > Agent (Zakim)
> > > > > page
> > > > > for useful intructions:
> > > > >    http://www.w3.org/2001/12/zakim-irc-bot
> > > > > 
> > > > > Agenda
> > > > > 
> > > > > 1. WWW2004 attendance (Eric)
> > > > > 
> > > > > 2. Additional datasets (MacKenzie, Eric)?
> > > > > 
> > > > > 3. Milestones document (MacKenzie)?
> > > > > 
> > > > > 4. Infrastructure status (Stefano)
> > > > > 
> > > > > 5. Development status (Ryan for Mark)
> > > > > 
> > > > > Regrets: Mark
> > > > > 
> > > > > Agenda at http://simile.mit.edu/wiki/MeetingAgenda15April2004
> > > > > 
> > > > > Please edit and add on the wiki.
> > > > > 
> > > > > -- 
> > > > > Ryan Lee                 ryanlee@w3.org
> > > > > W3C Research Engineer    +1.617.253.5327
> > > > > http://web.mit.edu/simile/www/
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > 
> 

Received on Thursday, 15 April 2004 15:25:37 UTC