- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Fri, 01 Aug 2003 09:47:10 -0700
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: www-rdf-dspace@w3.org
- Message-ID: <3F2A998E.2080502@hp.com>
Lynx dump of IRC Log: (html is attached) [INFO] Channel view for "[1]#simile" opened. === Highest connection count: 57 (56 clients) -->| YOU have joined [2]#simile =-= Topic for [3]#simile is "simile pi teleconf - em to be 10 min late :(" =-= Topic for [4]#simile was set by em on Fri Aug 01 2003 08:55:45 GMT-0700 (PDT) -->| marbut_ ([5]marbut@192.6.19.190) has joined [6]#simile |<-- marbut has left irc.w3.org (Connection reset by peer) [7]em dialing [8]marbut_ [9]http://www.oclc.org/research/projects/rdf_interop/index. shtm =-= em has changed the topic to "simile pi teleconf" [10]marbut_ [11]http://wip.dublincore.org/source.html [12]marbut_ [13]http://wip.dublincore.org:8080/interop/searchServlet [14]marbut_ KS: working on two things. One a memorandum of understanding for support for the project, the other is reading about doi to talk to John Ericsson about genesis [15]em [16]http://www.w3.org/2002/04/12-amico/ [17]marbut_ em: I'm still progressing on the sample data - see previous URL [18]marbut_ I've not heard back from the edutella folks nor the CIDOC folks [19]marbut_ There's a small collection of AMICO data available though [20]em [21]http://sh.webhire.com/servlet/av/jd?ai=631&ji=1274969&sn=I [22]marbut_ As regards the hire, we are online and off the w3c homesite with a pointer to the W3C position [23]marbut_ Mark: Next item - staged demostrators - any feedback? [24]marbut_ KS: The result we'll want to follow on with the demo develop. [25]marbut_ Mark: So what's the best way to do the persistant store bit? [26]marbut_ KS: You can start with Jena and add stuff on it, or start with genesis, which has a slightly different api. [27]marbut_ There are some limits on the complexity of the graphs in genesis, we need to do some more work on a higher level [28]marbut_ object api. We are working on this, but you need to figure if the higher level objects here are satisfactory. [29]marbut_ But what I anticipate you'll want to do is to start use Jena as a back end. That's how I anticipate it going. [30]marbut_ But in between, we'll try to make things compatible, this helps with the APIs [31]marbut_ we have an alpha level implementation of the first level of genesis abstraction, how distribution is done, differences between local and remote [32]marbut_ searches, but as I understood it distribution is not so important to the first demo [33]marbut_ so I was planning on reserving the ability to do distribution, not implement distribution right now, although I have [34]marbut_ and implementation, [35]marbut_ em: I think its a good idea, its a small, accomplishable demostrator, we can use it to tease out the team interaction, [36]marbut_ it gives us some idea to compare Jena and Genesis. If I understand your diagrams, then some of the query / inference layers [37]marbut_ could be in the persistant store. [38]marbut_ In the OCLC project we did this by emacs, doing it with editors might be interesting, but this seems scoped so we can have an early end date [39]marbut_ but I was hoping before christmas. [40]marbut_ mark: I'm hoping to do this before the hires are in place. [41]marbut_ em: let me offer me some lessons learnt from the OCLC project [42]marbut_ when we asked for the data, we didn't ask if we could publish it, or make it available to others [43]marbut_ we need to make it clear that we want to make the data available, for other implementations, [44]marbut_ also there was a tremendous amount of data management that had to go on [45]marbut_ e.g. xml was invalid, we tried to get diverse datasets, but we still had to do data cleanup, so we need to think about this also [46]marbut_ the other thing was picking your data, the focus was on diversity of datasets, since the datasets were so small the specific overlaps [47]marbut_ were quite hard to teaseout, so while the theory is good trying to integrate small collections of diverse data was hard [48]marbut_ because in practice no-one is going to search that stuff. We need to get complimentary collections that [49]marbut_ do have some overlap. I think the type of collections we are looking at are going to be better. [50]marbut_ The other thing we got burned on was performance. The way we did inference was more along the lines of oring, [51]marbut_ but the performance was very poor. For example imagine that rss.title is a subproperty of dc.title [52]marbut_ so say you want to search of dc.title="computers" then you search for all the resources that dc.title="computers" or rss.title="computer" [53]marbut_ so it was done at the query level, not below, e.g. forward vs backward chaining. [54]marbut_ The problem was with a 1000 records, and 4 or 5 subproperty relations, the performance became very slow, so it was taking 6 or 7 secs responses [55]marbut_ so the last thing we learned was this was a compelling example, that even with the delays, even with subproperty / equality relationships [56]marbut_ it was compelling for groups trying to integrate data from lots of collections. [57]marbut_ mark: does it use a specific query tool in Jena? [58]marbut_ em: no, it doesn't use rdql, before OCLC started to use Jena, it had a toolkit called EOR that was similar [59]marbut_ we had some fancy backend table representations for managing large scale triple stores [60]marbut_ e.g. s-p-o, the later one took Sergey Melniks work, so we had routines that could work with a model or with a backend relational [61]marbut_ data store, and created an API that worked with database, that created SQL queries to run those over the database [62]marbut_ em: i think lots of things were slowng this down, [63]marbut_ ks: I'm not sure how we can avoid doing ors [64]marbut_ em: I have some suggestions, but the project was focussed on getting something up [65]marbut_ it got a lot of interest, but it didn't move forward at OCLC [66]marbut_ one other lesson learnt, that gets back to genesis, there are 2 ways of viewing this - one of the areas we were exploring after that [67]marbut_ was at data ingestion time to add the inference, so you cache the inferences [68]marbut_ ks: that's the approach that haystack uses [69]marbut_ but it makes it harder to on-the-fly changes to equivalence [70]marbut_ doing it even adenine style means you have to do a batch update [71]marbut_ em: yes, tradeoffs either way - for the applications that oclc was dealing with, not seeing realtime results for [72]marbut_ changing the mapping wasn't important, but of course you create a lot more data [73]marbut_ in this 3 month pilot, the majority of the time was spent data massaging [74]marbut_ ks: I think best way to do this would be to have built in support for contains [75]marbut_ ks: keyword search has been done though, its the inference that causes the problem, but I'm not sure if I can think of a good way to do inference [76]marbut_ em: yes, but thats why it may be important. when we see it working, we may think of optimizations. It will tease out how [77]marbut_ to merge controlled vocabularies and how to merge indicies. So this is a useful scoped project to do this. |<-- marbut_ has left irc.w3.org (Client exited) References 1. irc://irc.w3.org:6665/%23simile 2. irc://irc.w3.org:6665/%23simile 3. irc://irc.w3.org:6665/%23simile 4. irc://irc.w3.org:6665/%23simile 5. mailto:marbut@192.6.19.190 6. irc://irc.w3.org:6665/%23simile 7. irc://irc.w3.org:6665/em,isnick 8. irc://irc.w3.org:6665/marbut_,isnick 9. http://www.oclc.org/research/projects/rdf_interop/index.shtm 10. irc://irc.w3.org:6665/marbut_,isnick 11. http://wip.dublincore.org/source.html 12. irc://irc.w3.org:6665/marbut_,isnick 13. http://wip.dublincore.org:8080/interop/searchServlet 14. irc://irc.w3.org:6665/marbut_,isnick 15. irc://irc.w3.org:6665/em,isnick 16. http://www.w3.org/2002/04/12-amico/ 17. irc://irc.w3.org:6665/marbut_,isnick 18. irc://irc.w3.org:6665/marbut_,isnick 19. irc://irc.w3.org:6665/marbut_,isnick 20. irc://irc.w3.org:6665/em,isnick 21. http://sh.webhire.com/servlet/av/jd?ai=631&ji=1274969&sn=I 22. irc://irc.w3.org:6665/marbut_,isnick 23. irc://irc.w3.org:6665/marbut_,isnick 24. irc://irc.w3.org:6665/marbut_,isnick 25. irc://irc.w3.org:6665/marbut_,isnick 26. irc://irc.w3.org:6665/marbut_,isnick 27. irc://irc.w3.org:6665/marbut_,isnick 28. irc://irc.w3.org:6665/marbut_,isnick 29. irc://irc.w3.org:6665/marbut_,isnick 30. irc://irc.w3.org:6665/marbut_,isnick 31. irc://irc.w3.org:6665/marbut_,isnick 32. irc://irc.w3.org:6665/marbut_,isnick 33. irc://irc.w3.org:6665/marbut_,isnick 34. irc://irc.w3.org:6665/marbut_,isnick 35. irc://irc.w3.org:6665/marbut_,isnick 36. irc://irc.w3.org:6665/marbut_,isnick 37. irc://irc.w3.org:6665/marbut_,isnick 38. irc://irc.w3.org:6665/marbut_,isnick 39. irc://irc.w3.org:6665/marbut_,isnick 40. irc://irc.w3.org:6665/marbut_,isnick 41. irc://irc.w3.org:6665/marbut_,isnick 42. irc://irc.w3.org:6665/marbut_,isnick 43. irc://irc.w3.org:6665/marbut_,isnick 44. irc://irc.w3.org:6665/marbut_,isnick 45. irc://irc.w3.org:6665/marbut_,isnick 46. irc://irc.w3.org:6665/marbut_,isnick 47. irc://irc.w3.org:6665/marbut_,isnick 48. irc://irc.w3.org:6665/marbut_,isnick 49. irc://irc.w3.org:6665/marbut_,isnick 50. irc://irc.w3.org:6665/marbut_,isnick 51. irc://irc.w3.org:6665/marbut_,isnick 52. irc://irc.w3.org:6665/marbut_,isnick 53. irc://irc.w3.org:6665/marbut_,isnick 54. irc://irc.w3.org:6665/marbut_,isnick 55. irc://irc.w3.org:6665/marbut_,isnick 56. irc://irc.w3.org:6665/marbut_,isnick 57. irc://irc.w3.org:6665/marbut_,isnick 58. irc://irc.w3.org:6665/marbut_,isnick 59. irc://irc.w3.org:6665/marbut_,isnick 60. irc://irc.w3.org:6665/marbut_,isnick 61. irc://irc.w3.org:6665/marbut_,isnick 62. irc://irc.w3.org:6665/marbut_,isnick 63. irc://irc.w3.org:6665/marbut_,isnick 64. irc://irc.w3.org:6665/marbut_,isnick 65. irc://irc.w3.org:6665/marbut_,isnick 66. irc://irc.w3.org:6665/marbut_,isnick 67. irc://irc.w3.org:6665/marbut_,isnick 68. irc://irc.w3.org:6665/marbut_,isnick 69. irc://irc.w3.org:6665/marbut_,isnick 70. irc://irc.w3.org:6665/marbut_,isnick 71. irc://irc.w3.org:6665/marbut_,isnick 72. irc://irc.w3.org:6665/marbut_,isnick 73. irc://irc.w3.org:6665/marbut_,isnick 74. irc://irc.w3.org:6665/marbut_,isnick 75. irc://irc.w3.org:6665/marbut_,isnick 76. irc://irc.w3.org:6665/marbut_,isnick 77. irc://irc.w3.org:6665/marbut_,isnick Butler, Mark wrote: >Hi Team > >I made a mistake, the participant pin is 733650 > >Toll Free Access Number: > 866 276 8920 >UK FreeCall Access Number: > 0800 073 8926 > >Mark > > > >>-----Original Message----- >>From: Butler, Mark [mailto:Mark_Butler@hplb.hpl.hp.com] >>Sent: 01 August 2003 11:34 >>To: www-rdf-dspace@w3.org >>Subject: SIMILE PI phone conference, 01-August-2003 1200 EDT/1700 BST >> >> >>SIMILE PI phone conference, 01-August-03 1200 EDT/1700 BST >> >>Toll Free Access Number: >> 866 276 8920 >>UK FreeCall Access Number: >> 0800 073 8926 >>Participant PIN: >> 2536617 >> >>Please join irc channel: >>irc://irc.w3.org:6665/simile >> >>Agenda: >> >>1/ update, status, & next steps >> >>2/ Discussion: Proposal for staged demostrators - background >> >>OCLC RDF-DC Interop Project >>http://www.oclc.org/research/projects/rdf_interop/index.shtm >>OCLC RDF-DC Interop CVS Repository >>http://wip.dublincore.org/source.html >>Proposal for staged development of demonstrator >>http://lists.w3.org/Archives/Public/www-rdf-dspace/2003Jul/0039.html >>Task Assignments for Demonstrator >>(See enclosed document) >> >>3/ Any other business >> >>Dr Mark H. Butler >>Research Scientist HP Labs Bristol >>mark-h_butler@hp.com >>Internet: http://www-uk.hpl.hp.com/people/marbut/ >> >> >> >> >> >> >> -- ======================================================== Kevin Smathers kevin.smathers@hp.com Hewlett-Packard kevin@ank.com Palo Alto Research Lab 1501 Page Mill Rd. 650-857-4477 work M/S 1135 650-852-8186 fax Palo Alto, CA 94304 510-247-1031 home ======================================================== use "Standard::Disclaimer"; carp("This message was printed on 100% recycled bits.");
Attachments
- text/html attachment: irclog.1.aug.2003.html
Received on Friday, 1 August 2003 12:50:06 UTC