- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Fri, 15 Aug 2003 18:13:23 +0100
- To: "'SIMILE public list'" <www-rdf-dspace@w3.org>
You are now talking on #simile <em> AndyS, runnning late - will join in 10 <AndyS> em: its just starting - I'll pass on the message --> mickBass (bass@192.6.19.124) has joined #simile <mickBass> attending: Andy S, Mark B, John G, Mick B --> kevins2 (chatzilla@192.6.19.104) has joined #simile <mickBass> attending: Kevin S <mickBass> MacKenzie has confirmed by email, awaiting her arrival <mickBass> em to join in ~10 mins <marbut> mickBass: when we say calendar 2003 year end, this is an internal check point <marbut> the checkpoint is to have that demo ready and executed within the team, so it can be presented to the alliance steering committee after the new year <marbut> the target is to nail all these milestones except the last one <marbut> (now joining: MacKenzie Smith) <marbut> (Other attendees: Mick Bass, Kevin Smathers, Andy Seaborne, John Gilbert, Mark Butler) <marbut> the first document frames what are we after with this first year demo <marbut> the second frames the contributions of the different people involved in the work <marbut> so the agenda for the day has 3 items: <marbut> look at gathering the corpus, as that's the item that is front and center <marbut> so we've had some contact with Eric, he's going to join in 10 minutes <marbut> so I wanted to have some discussion with you (MS) and EM about getting the content <marbut> the second item is discussion and review of key milestones document <marbut> the third item is discussion and review of the who / what / why document <marbut> comments to the list are also welcome <marbut> ms: I understand you have some data from AMICO. Does that include content? <marbut> KS: Yes <mickBass> KS: I think it is 1 tiff per record <marbut> MS: we did hear back about CIDOC, I think there is some test data available for that schema <marbut> (Now joining: David Karger) <mickBass> marbut: no RDFS for VRA-core, have had a go at creating one today, have just sent to list. <marbut> MS: The AMICO one is a version of VRAcore? <mickBass> marbut: AMICA schema is similar, but should revisit some decisions <marbut> MS: I could put someone on extracting records, if we need that? <mickBass> KS: AMICA provided 50 records <marbut> (now joining: Eric Miller) <marbut> MS: The message about CIDOC doesn't give any sense of numbers <mickBass> MS: message from Martin Doerr gives no numbers. "several test sets are available" <marbut> we could get some records in that schema, its not the top priority as its not VRA <marbut> mickBass: When we think about the test corpus, we need to generate two islands. The questions are tactically whats <marbut> the best way of doing that? <marbut> and how do we get as large an island as possible? <marbut> and there is a risk analysis piece? <marbut> MS: There is a lot of VRA data, but not much IMS data <marbut> mickBass: The demo will be more convincing if we have a big database for both types of records <mickBass> queue andy seaborne <marbut> DK: Can we can the demo, e.g. just select the records in the second set so that they will match the queries <marbut> MS: Yes, maybe the thing to do is a simple fake demo that shows the technology will work <mickBass> EM: on IMS, have been in contact with ? re: access to collections through edutella <mickBass> MS: OK to use other peoples data for demo? <mickBass> DK: yes <mickBass> MS: yes, OK. <mickBass> MS: eventually all OCW records will get exported to IMS. <AndyS> [There are only 7 Amico tiffs - 206 records in the /data directory] <marbut> em: Do we need more AMICO records? <mickBass> EM: edutella has been running for several years... hopefully larger number of IMS records <AndyS> [ 3270 RDF statements ] <marbut> KS: There are some things missing from the AMICO data, reference to vocabuaries which are just implied <marbut> em: I made them up <marbut> KS: If validity is important, we need controlled vocabularies also <marbut> em: AMICO has there only way of structuing this stuff, using three letter element DTDs <marbut> This was an attempt to say "lets do this using standards" <marbut> so it takes a small set, shows how you can represent it in RDF, but now you could use it in other ways, it doesn't take a year to do <marbut> so they are using it in some test beds now, so if they've used it more we should be able to get some more data <marbut> fairly quickly <marbut> mickBass: For these 2 schemas, the way we ought to be behaving is to try to pull in schemas / instances from where ever they may be. Because <marbut> then we create options. <marbut> It sounds like there is a risk, particularly in IMS, that we won't be able to get a large number of records? <marbut> MS: What do you mean by large? <marbut> mickBass: 20K records <marbut> MS: As DK said, there are 2 phases here - get the demo to work, then scale it up <marbut> KS: Isn't scalability part of the demo? <marbut> MS: If you want a large amount of quality data that you can demo, it may take more than 3 months <marbut> mickBass: As andy has pointed out, we can work with the amico data now. But until we have a substantial corpus of records, its hard to make <marbut> statements about the type of mapping rules we need to support <marbut> em: with the OCLC, it wasn't the mapping rules, its just the amount of data meant "so what" <marbut> you need the collection to get people's interest piqued <marbut> mickBass: I'm not saying we don't want to work on the technology, but I don't want us to stop getting the metadata <marbut> I'm hearing theres risk, I'm not hearing if its possible <marbut> MS: VRAcore should be okay, IMS is too new, hasn't borne fruit yet <marbut> (how do I tell if rssagent is running?) <marbut> DK: Is 20K records enough to demonstrate scalability - we would need a few million? <marbut> KS: At the plenary, we agreed 100K was okay <marbut> MS: I think there are 20K VRAcore records, not sure if we can get this for IMS <marbut> AS: We want a total corpus of 100K records, i.e. it has to be on disk, to check the memory to disk <mickBass> KS: could target VRA to DC mapping <em> do people know about http://www.mindswap.org/2003/CancerOntology/ ? <marbut> (yep, put it in the bibliography)# <em> good, thanks - not sure if instance data corresponding to this would be of use to the group or not <marbut> DK: I'd love DC to be part of this as it is very generic <marbut> MS: There are some pretty big collections of image data with DC descriptors <marbut> mickBass: would it be a better demo to use vra and DC, than vra and IMS? <marbut> DK: Scalability and interoperability don't have to be done at the same time. We could demo interop on small <marbut> collections, then scalability just on one record set <marbut> mickBass: one thing that might be useful is to create a list of options <marbut> eric, mackenzie, I think you have this list in your head <marbut> MS: It's more that just data, we need licenses as well e.g. what will happen to the data <marbut> So each of these will take a month to talk to. So if you are talking significant numbers of records, AMICO has them <marbut> I haven't seen a script for the demo yet, so I'm not sure what we are trying to accomplish. <marbut> They will want to know what it is we want to do. Until we know what we want to do, I need to know what the demo is, <marbut> who it is for, etc <marbut> mickBass: I could take demo 1a and 1b and translate it to a script and some sort of statement of intention <marbut> Mark: we need to look back at the OCLC demo, see how it works, before moving onto the script <marbut> em: this means we need to have data that is complimentary for what we want to do <marbut> mickBass: there's a cyclical dependency here, I write it a description e.g. the type of data, what we want to do it, <marbut> who we want to show it to, want the boundaries are, but in some sense we can't lock down the demo script, the script <marbut> is dependent on data and vice versa <marbut> em: I'm not running into that problem yet, <marbut> mickBass: I can write it in prose, <marbut> ms: I want to come back to the number of records, what makes a demo compelling is that it does something that people care about <marbut> we need to taken a different tack this time, even if it means handwriting a hundred records <marbut> I fundamentally disagree that we need to get a lot of records <marbut> mickBass: there are 2 stages, we need to decide where to set the bar <marbut> if we hand craft the data set, and its possible to get useful results <marbut> then people might say "this works, but you had to hand craft the data" <marbut> I'm also hearing that if thats all you do its not quite as compelling as it could be <marbut> with IMS provided by the community, you can actually do useful recall across these mapping technologies. <marbut> at some point you have to demostrate it works, even if the records aren't handcrafted <marbut> em: I'm can make a case for scalability. The OCLC demo was slow with a small number of records <marbut> you want to make it compelling, but if you are not as fast as google its tough to say "look at the flexibility" <marbut> some of us need to be focussing on data collection, some on speed and performance, we can bring them together later <marbut> mickBass: one is corpus gathering, the other is tidying up the specific demo script <marbut> we need to run these in parallel for a little while <marbut> can we agree on thinking about vra, dc and ims? <marbut> and can we agree to start to think about the list? <marbut> MS: the list is 4 places for VRA, 0 for IMS <marbut> I can start asking around, but I'm not sure what its achieving <marbut> mickBass: what about DC? <marbut> em: sure <marbut> right now? <marbut> I can get a collection from Adobe as XMP <marbut> MS: But is it just Dublin Core, or image collections? <marbut> em: you could broaden this to photo.net or rdf.pic or photo metadata etc. We could try other vocabularies e.g. friend of a friend <marbut> MS: we need to story board the demo to show what we want to show <marbut> MS: For us to go a write down every type of data in the whole world <marbut> DK: I've got to go <marbut> mickBass: I want the team involved it? I need to get it out your head, so the team can work on it? <marbut> MS: may EM has lists in his head, but I have to do research, so its not a matter of a quick brain dump <marbut> what I bring is connections to people who may or may not have data <marbut> So I am also feeling a bit stuck <marbut> em: it could be museum things, or satelite things etc <marbut> ms: perhaps mick, em and me have to have a call and decide where to get the data from <marbut> I'm willing to get us some data, but only if its something we actually need and will use <marbut> we need to put some more thought into the demo <marbut> mickBass: I think it is a good idea for EM / MS and me to do this <marbut> MS: I heard about another big project recently that might be relevant, which is based in Manchester <marbut> Jorum <mickBass> MS: jorum, big project to build learning object repositories <marbut> this project looks complimentary, they may / may not have test data, it would be good to talk to them <marbut> they are almost certain to be at the meeting, because they are also backing the project <marbut> mickBass: it sounds like Paul is going to go <marbut> I'll have a conversation with Mark and Paul, check we have coverage there <marbut> if these folks have IMS image data that they can share, that might be helpful <marbut> MS: I'd love to see this demo be used by an academic who uses this for courses
Received on Friday, 15 August 2003 13:13:45 UTC