- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Fri, 01 Aug 2003 09:47:10 -0700
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: www-rdf-dspace@w3.org
- Message-ID: <3F2A998E.2080502@hp.com>
Lynx dump of IRC Log: (html is attached)
[INFO] Channel view for "[1]#simile" opened.
=== Highest connection count: 57 (56 clients)
-->| YOU have joined [2]#simile
=-= Topic for [3]#simile is "simile pi teleconf - em to be 10 min late
:("
=-= Topic for [4]#simile was set by em on Fri Aug 01 2003 08:55:45
GMT-0700 (PDT)
-->| marbut_ ([5]marbut@192.6.19.190) has joined [6]#simile
|<-- marbut has left irc.w3.org (Connection reset by peer)
[7]em dialing
[8]marbut_ [9]http://www.oclc.org/research/projects/rdf_interop/index.
shtm
=-= em has changed the topic to "simile pi teleconf"
[10]marbut_ [11]http://wip.dublincore.org/source.html
[12]marbut_ [13]http://wip.dublincore.org:8080/interop/searchServlet
[14]marbut_ KS: working on two things. One a memorandum of
understanding for support for the project, the other is reading about
doi to talk to John Ericsson about genesis
[15]em [16]http://www.w3.org/2002/04/12-amico/
[17]marbut_ em: I'm still progressing on the sample data - see
previous URL
[18]marbut_ I've not heard back from the edutella folks nor the CIDOC
folks
[19]marbut_ There's a small collection of AMICO data available though
[20]em [21]http://sh.webhire.com/servlet/av/jd?ai=631&ji=1274969&sn=I
[22]marbut_ As regards the hire, we are online and off the w3c
homesite with a pointer to the W3C position
[23]marbut_ Mark: Next item - staged demostrators - any feedback?
[24]marbut_ KS: The result we'll want to follow on with the demo
develop.
[25]marbut_ Mark: So what's the best way to do the persistant store
bit?
[26]marbut_ KS: You can start with Jena and add stuff on it, or start
with genesis, which has a slightly different api.
[27]marbut_ There are some limits on the complexity of the graphs in
genesis, we need to do some more work on a higher level
[28]marbut_ object api. We are working on this, but you need to figure
if the higher level objects here are satisfactory.
[29]marbut_ But what I anticipate you'll want to do is to start use
Jena as a back end. That's how I anticipate it going.
[30]marbut_ But in between, we'll try to make things compatible, this
helps with the APIs
[31]marbut_ we have an alpha level implementation of the first level
of genesis abstraction, how distribution is done, differences between
local and remote
[32]marbut_ searches, but as I understood it distribution is not so
important to the first demo
[33]marbut_ so I was planning on reserving the ability to do
distribution, not implement distribution right now, although I have
[34]marbut_ and implementation,
[35]marbut_ em: I think its a good idea, its a small, accomplishable
demostrator, we can use it to tease out the team interaction,
[36]marbut_ it gives us some idea to compare Jena and Genesis. If I
understand your diagrams, then some of the query / inference layers
[37]marbut_ could be in the persistant store.
[38]marbut_ In the OCLC project we did this by emacs, doing it with
editors might be interesting, but this seems scoped so we can have an
early end date
[39]marbut_ but I was hoping before christmas.
[40]marbut_ mark: I'm hoping to do this before the hires are in place.
[41]marbut_ em: let me offer me some lessons learnt from the OCLC
project
[42]marbut_ when we asked for the data, we didn't ask if we could
publish it, or make it available to others
[43]marbut_ we need to make it clear that we want to make the data
available, for other implementations,
[44]marbut_ also there was a tremendous amount of data management that
had to go on
[45]marbut_ e.g. xml was invalid, we tried to get diverse datasets,
but we still had to do data cleanup, so we need to think about this
also
[46]marbut_ the other thing was picking your data, the focus was on
diversity of datasets, since the datasets were so small the specific
overlaps
[47]marbut_ were quite hard to teaseout, so while the theory is good
trying to integrate small collections of diverse data was hard
[48]marbut_ because in practice no-one is going to search that stuff.
We need to get complimentary collections that
[49]marbut_ do have some overlap. I think the type of collections we
are looking at are going to be better.
[50]marbut_ The other thing we got burned on was performance. The way
we did inference was more along the lines of oring,
[51]marbut_ but the performance was very poor. For example imagine
that rss.title is a subproperty of dc.title
[52]marbut_ so say you want to search of dc.title="computers" then you
search for all the resources that dc.title="computers" or
rss.title="computer"
[53]marbut_ so it was done at the query level, not below, e.g. forward
vs backward chaining.
[54]marbut_ The problem was with a 1000 records, and 4 or 5
subproperty relations, the performance became very slow, so it was
taking 6 or 7 secs responses
[55]marbut_ so the last thing we learned was this was a compelling
example, that even with the delays, even with subproperty / equality
relationships
[56]marbut_ it was compelling for groups trying to integrate data from
lots of collections.
[57]marbut_ mark: does it use a specific query tool in Jena?
[58]marbut_ em: no, it doesn't use rdql, before OCLC started to use
Jena, it had a toolkit called EOR that was similar
[59]marbut_ we had some fancy backend table representations for
managing large scale triple stores
[60]marbut_ e.g. s-p-o, the later one took Sergey Melniks work, so we
had routines that could work with a model or with a backend relational
[61]marbut_ data store, and created an API that worked with database,
that created SQL queries to run those over the database
[62]marbut_ em: i think lots of things were slowng this down,
[63]marbut_ ks: I'm not sure how we can avoid doing ors
[64]marbut_ em: I have some suggestions, but the project was focussed
on getting something up
[65]marbut_ it got a lot of interest, but it didn't move forward at
OCLC
[66]marbut_ one other lesson learnt, that gets back to genesis, there
are 2 ways of viewing this - one of the areas we were exploring after
that
[67]marbut_ was at data ingestion time to add the inference, so you
cache the inferences
[68]marbut_ ks: that's the approach that haystack uses
[69]marbut_ but it makes it harder to on-the-fly changes to
equivalence
[70]marbut_ doing it even adenine style means you have to do a batch
update
[71]marbut_ em: yes, tradeoffs either way - for the applications that
oclc was dealing with, not seeing realtime results for
[72]marbut_ changing the mapping wasn't important, but of course you
create a lot more data
[73]marbut_ in this 3 month pilot, the majority of the time was spent
data massaging
[74]marbut_ ks: I think best way to do this would be to have built in
support for contains
[75]marbut_ ks: keyword search has been done though, its the inference
that causes the problem, but I'm not sure if I can think of a good way
to do inference
[76]marbut_ em: yes, but thats why it may be important. when we see it
working, we may think of optimizations. It will tease out how
[77]marbut_ to merge controlled vocabularies and how to merge
indicies. So this is a useful scoped project to do this.
|<-- marbut_ has left irc.w3.org (Client exited)
References
1. irc://irc.w3.org:6665/%23simile
2. irc://irc.w3.org:6665/%23simile
3. irc://irc.w3.org:6665/%23simile
4. irc://irc.w3.org:6665/%23simile
5. mailto:marbut@192.6.19.190
6. irc://irc.w3.org:6665/%23simile
7. irc://irc.w3.org:6665/em,isnick
8. irc://irc.w3.org:6665/marbut_,isnick
9. http://www.oclc.org/research/projects/rdf_interop/index.shtm
10. irc://irc.w3.org:6665/marbut_,isnick
11. http://wip.dublincore.org/source.html
12. irc://irc.w3.org:6665/marbut_,isnick
13. http://wip.dublincore.org:8080/interop/searchServlet
14. irc://irc.w3.org:6665/marbut_,isnick
15. irc://irc.w3.org:6665/em,isnick
16. http://www.w3.org/2002/04/12-amico/
17. irc://irc.w3.org:6665/marbut_,isnick
18. irc://irc.w3.org:6665/marbut_,isnick
19. irc://irc.w3.org:6665/marbut_,isnick
20. irc://irc.w3.org:6665/em,isnick
21. http://sh.webhire.com/servlet/av/jd?ai=631&ji=1274969&sn=I
22. irc://irc.w3.org:6665/marbut_,isnick
23. irc://irc.w3.org:6665/marbut_,isnick
24. irc://irc.w3.org:6665/marbut_,isnick
25. irc://irc.w3.org:6665/marbut_,isnick
26. irc://irc.w3.org:6665/marbut_,isnick
27. irc://irc.w3.org:6665/marbut_,isnick
28. irc://irc.w3.org:6665/marbut_,isnick
29. irc://irc.w3.org:6665/marbut_,isnick
30. irc://irc.w3.org:6665/marbut_,isnick
31. irc://irc.w3.org:6665/marbut_,isnick
32. irc://irc.w3.org:6665/marbut_,isnick
33. irc://irc.w3.org:6665/marbut_,isnick
34. irc://irc.w3.org:6665/marbut_,isnick
35. irc://irc.w3.org:6665/marbut_,isnick
36. irc://irc.w3.org:6665/marbut_,isnick
37. irc://irc.w3.org:6665/marbut_,isnick
38. irc://irc.w3.org:6665/marbut_,isnick
39. irc://irc.w3.org:6665/marbut_,isnick
40. irc://irc.w3.org:6665/marbut_,isnick
41. irc://irc.w3.org:6665/marbut_,isnick
42. irc://irc.w3.org:6665/marbut_,isnick
43. irc://irc.w3.org:6665/marbut_,isnick
44. irc://irc.w3.org:6665/marbut_,isnick
45. irc://irc.w3.org:6665/marbut_,isnick
46. irc://irc.w3.org:6665/marbut_,isnick
47. irc://irc.w3.org:6665/marbut_,isnick
48. irc://irc.w3.org:6665/marbut_,isnick
49. irc://irc.w3.org:6665/marbut_,isnick
50. irc://irc.w3.org:6665/marbut_,isnick
51. irc://irc.w3.org:6665/marbut_,isnick
52. irc://irc.w3.org:6665/marbut_,isnick
53. irc://irc.w3.org:6665/marbut_,isnick
54. irc://irc.w3.org:6665/marbut_,isnick
55. irc://irc.w3.org:6665/marbut_,isnick
56. irc://irc.w3.org:6665/marbut_,isnick
57. irc://irc.w3.org:6665/marbut_,isnick
58. irc://irc.w3.org:6665/marbut_,isnick
59. irc://irc.w3.org:6665/marbut_,isnick
60. irc://irc.w3.org:6665/marbut_,isnick
61. irc://irc.w3.org:6665/marbut_,isnick
62. irc://irc.w3.org:6665/marbut_,isnick
63. irc://irc.w3.org:6665/marbut_,isnick
64. irc://irc.w3.org:6665/marbut_,isnick
65. irc://irc.w3.org:6665/marbut_,isnick
66. irc://irc.w3.org:6665/marbut_,isnick
67. irc://irc.w3.org:6665/marbut_,isnick
68. irc://irc.w3.org:6665/marbut_,isnick
69. irc://irc.w3.org:6665/marbut_,isnick
70. irc://irc.w3.org:6665/marbut_,isnick
71. irc://irc.w3.org:6665/marbut_,isnick
72. irc://irc.w3.org:6665/marbut_,isnick
73. irc://irc.w3.org:6665/marbut_,isnick
74. irc://irc.w3.org:6665/marbut_,isnick
75. irc://irc.w3.org:6665/marbut_,isnick
76. irc://irc.w3.org:6665/marbut_,isnick
77. irc://irc.w3.org:6665/marbut_,isnick
Butler, Mark wrote:
>Hi Team
>
>I made a mistake, the participant pin is 733650
>
>Toll Free Access Number:
> 866 276 8920
>UK FreeCall Access Number:
> 0800 073 8926
>
>Mark
>
>
>
>>-----Original Message-----
>>From: Butler, Mark [mailto:Mark_Butler@hplb.hpl.hp.com]
>>Sent: 01 August 2003 11:34
>>To: www-rdf-dspace@w3.org
>>Subject: SIMILE PI phone conference, 01-August-2003 1200 EDT/1700 BST
>>
>>
>>SIMILE PI phone conference, 01-August-03 1200 EDT/1700 BST
>>
>>Toll Free Access Number:
>> 866 276 8920
>>UK FreeCall Access Number:
>> 0800 073 8926
>>Participant PIN:
>> 2536617
>>
>>Please join irc channel:
>>irc://irc.w3.org:6665/simile
>>
>>Agenda:
>>
>>1/ update, status, & next steps
>>
>>2/ Discussion: Proposal for staged demostrators - background
>>
>>OCLC RDF-DC Interop Project
>>http://www.oclc.org/research/projects/rdf_interop/index.shtm
>>OCLC RDF-DC Interop CVS Repository
>>http://wip.dublincore.org/source.html
>>Proposal for staged development of demonstrator
>>http://lists.w3.org/Archives/Public/www-rdf-dspace/2003Jul/0039.html
>>Task Assignments for Demonstrator
>>(See enclosed document)
>>
>>3/ Any other business
>>
>>Dr Mark H. Butler
>>Research Scientist HP Labs Bristol
>>mark-h_butler@hp.com
>>Internet: http://www-uk.hpl.hp.com/people/marbut/
>>
>>
>>
>>
>>
>>
>>
--
========================================================
Kevin Smathers kevin.smathers@hp.com
Hewlett-Packard kevin@ank.com
Palo Alto Research Lab
1501 Page Mill Rd. 650-857-4477 work
M/S 1135 650-852-8186 fax
Palo Alto, CA 94304 510-247-1031 home
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");
Attachments
- text/html attachment: irclog.1.aug.2003.html
Received on Friday, 1 August 2003 12:50:06 UTC