Session Start (irc.w3.org:#simile): Wed Jul 23 13:34:44 2003 [13:34] *** Now talking in #simile. *** Users on #simile: Rob marbut @AndyS *** End of /NAMES list. *** Your user mode is "+i" *** Mode for channel #simile is "+" *** Channel #simile was created at Wed Jul 23 13:33:30 2003 [13:34] *** Join to #simile completed in 4 seconds. [13:34] e.g. meiko [13:35] amico,see (Link: http://www.amico.org/,)http://www.amico.org/, The Art Museum Image Consortium (AMICO) [13:35] Issue is identifying a collection we can use for SIMILE [13:36] seeAlo: CIDOC, Harvard collection [13:37] *** em-lap (em@18.29.0.30) has joined channel #simile [13:37] Mackenzie: What is the minimum amount of data? [13:38] All: 40K records to start should be OK [13:38] *** KevinS (chatzilla@18.99.1.100) has joined channel #simile [13:38] * AndyS asks em about a logger [13:39] hmm... [13:39] * Rob 's client is logging [13:39] Mick: Next: what packaging formats of images metadata is implied? [13:39] em: METS etc [13:39] * em-lap not sure how to use new client... /invites dont seem to be working [13:40] can one of you type '/invite rrsagent' [13:40] * AndyS has logging locally *** Inviting RRSAgent to channel #simile. [13:40] *** RRSAgent (rrs-loggee@18.29.0.30) has joined channel #simile [13:40] * RRSAgent is logging [13:40] * AndyS done [13:40] hmm.... maybe it worked? [13:40] just took a while... [13:40] * AndyS is @AndyS [13:40] I just typed that cmd [13:40] ah [13:40] thanks RRSAgent [13:41] err.. Rob :) [13:41] SIPs: METS, VRAcore [13:41] * em-lap thinks he's not a fan of this clinet [13:41] History collection - many images, not so much metadata [13:42] 1/ Constituent has cataloguing records - submit to SIMILE [13:42] 2/ Just images (or very little metadata) [13:42] 3/ Images+ records [13:43] 4/ Add metadata to existing images [13:43] For images UC, there are no clear set of users [13:44] Mackenzie: today, phsyical slides used [13:45] ... in teaching. Go back to the department collection. [13:46] There is an online catalogue for finding the physical slides. [13:47] Physical media - need to checked out. [13:47] s/to/to be/ [13:48] OCW causes digitialisation as its a web site! [13:48] 5/ Digitize and add metadata to existingcatalogue record [13:49] Bulk import to seed then one by one additions [13:50] KS asked: answer is that VRA can have support for thumbnails. [13:51] VRA: Group-Work-Surrogates-Details [13:51] Surrogates can be at work or detail level (or even group) [13:53] Mick: Any AIPs? [13:53] Eric: Most at image level [13:53] rather than collections [13:54] Mackenzie: assume we deal withn the work (for now) - not the collection - for AIPs [13:54] Lots of surrogates. [13:55] With metadata like dimensions [13:56] (Link: http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=/html/VIA.html:style=via)http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=/html/VIA.html:style=via [13:57] (Link: http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=html/VIAhomeframe.html:style=via)http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=html/VIAhomeframe.html:style=via [13:57] the site search [13:57] A record : (Link: http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=html/VIAhomeframe.html:style=via)http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=html/VIAhomeframe.html:style=via [13:58] (search for "Hewlett") [14:00] Shows examples of displays of VRA records [14:02] In practice people do "google" style searches - words in first box only [14:05] The exact usage of collections/groups/etc allows for variance in usage [14:06] As preexisting data, complex to build the single view for the website [14:09] The metadata is in "Site Search" - an OCLC system (pre-XML almost) [14:10] Moving to Tamino this year [14:11] Issue: copyright on untracable materials - rather hard [14:14] The original metadata is not preserved - its just a means of access but ... [14:14] ... the technical metadate can be important. [14:16] dozen repositories (museums and archives) [14:19] em: After Site Search, moved onto more RDF-like, not predefined access paths [14:20] Mackenzie: thumbnails are usually free [14:21] 200K images today. [14:22] KevinS: Any collections of video streams? [14:22] Mackenzie: not really at Harvard - would not be VRA [14:23] Others exist but don't have good metadata (minimal, like lecture number for lecture videos) [14:24] Example: (Link: http://MITworld.mit.edu/)http://MITworld.mit.edu/ [14:25] Example: Oyez - supreme court votes and discussions [14:27] Mick: want something achievable and interesting [14:27] Video afterwards! [14:29] JSE finds example where field reuse occurs depending on item type - Mackenzie says it is common [14:31] Each collection will want additional metadata as well as std like VARcore [14:31] s/VARcore/VRAcore/ [14:32] Many stds have hooks for domain-specific hooks or profiles [14:33] Mackenzie: heterogeneity of schemas and of application rules / controlled vocabularies [14:34] e.g. VRA+these content rules (e.g. the mening of "date") - use XYZ for subject field, etc etc [14:34] Need to capture content rules [14:35] (em speaking) [14:36] Mark: History system used own URIs with links back to std (eg. HarmonyABC) [14:37] No encoding rules for value of a propery (e.g. max value of a field) [14:39] em: continuum from human description to machine processable rules [14:40] Many: OWL does some of this [14:40] but not all - need data cleaning, ingestion specific processes and general mechanisms [14:43] *** Signoff: em-lap (Ping timeout) [14:44] Issue: coding data entry systems is costly [14:45] Example: subject vocabularies [14:45] Example: laying out names of authors [14:46] Dates: date of picture, digitisation, ingestion etc etc [14:50] Possible task: someone survey the possible data testing tools we could employ [14:51] Need to check on ingestion that the data is "good" [14:57] Mark wishes to understand why we need more than we saw on web [14:57] Eric/Mackenize: it took 2 years per system [14:58] If just wanted VRAcore then could replicate. But new schemas come along all the time. And they evolve [14:59] Desire (library community) for machine processable content rules. Execute on ingestion. [15:00] Different schemas for technical details of images e.g. ISO std [15:00] Different schemas for different types [15:01] ISO still image format [15:03] DIPs: catalogue records, images, thumbnails of items. [15:04] Packaging for export : need to insert into other systems [15:04] Different schema from SIP. [15:08] Next: other schemas: images in with VRA core and want to add other metadata *** You have been marked as being away. *** You are no longer marked as being away. [15:28] *** em (em@18.29.0.30) has joined channel #simile [15:28] *** Zakim (rrs-bridgg@18.29.0.27) has joined channel #simile [15:28] zakim, please leave [15:28] *** Zakim has left #simile [15:30] * AndyS asks for someone to scribe the next section [15:31] * Rob will do it [15:31] * AndyS thanks Rob [15:31] *** Signoff: KevinS (Client exited) [15:32] Eric mentioned some work had been done on representing the National Cancer Institute's thesaurus in OWL. There is an interesting paper about this here (Link: http://www.mindswap.org/papers/WebSemantics-NCI.pdf)http://www.mindswap.org/papers/WebSemantics-NCI.pdf [15:32] part of the workflow with DSpace is someone goes to MIT Libraries and says "we want to use schema X"... do you get machine-processable schema representations? [15:32] [Mick ^^^] [15:32] MacKenzie: usually XML schemas [15:34] usually MIT Libraries will say e.g. "you need to use VRA Core" [15:34] community will want to add a couple of extra fields [15:34] occasionally people will say "please archive this database" [15:34] there are a number of scenarios [15:34] (Also see (Link: http://www.mindswap.org/2003/CancerOntology/)http://www.mindswap.org/2003/CancerOntology/) [15:34] Mick: In this use case what is machine-processable format that expresses the schema? [15:34] XML Schema? RDF Schema? [15:35] Might be a task for the libraries [15:35] MacKenzie: noone in libraries to produce XML schemas [15:36] *** em has changed the topic on channel #simile to Simile PI meeting [15:37] David: We are building a system where community can enforce there own "content rules" (for want of a better term)? [15:38] Mark B: Variety of validation and schema expression techniques... e.g. OWL is best for expressing relationships between schemas [15:38] Need to decide which mix to go for [15:39] There are also moves to escape this need... e.g. CNRI work on automatic configuration, RDDL [15:39] (feel free to correct my notes if incorrect) [15:41] Diff people use different terms in different contexts, e.g. John G has to describe use of these terms (ontology, schema, vocab. etc) [15:41] David: Back to how to get schema into system... standardised mechanism for this? Likely to be rare, at least initially -- so no need to focus on that now [15:42] Martin: A win if it takes less than two years! [15:42] Mark: Not jsut registering schema -- need to support ingest, UI, workflow etc. [15:43] David: Currently human beings usually do validation etc. -- win if we can make it easier for a human [15:44] MacKenzie: Usually Margret (MIT Library faculty liaison) negotiates all this with community representative [15:44] Would be nice to normalise this process / information [15:45] Mick: core part of demo that it's easier to produce this normalised version of the schema and introduce it, rather than produce a load of custom code [15:46] Kevin: WOuld like to know what e.g. VRA-core specific code is doing [15:48] David: even without schema -- just having lots of RDF is useful. Do not want it to be central to work that we figure out how to accept schemas [15:49] (playing devil's advocate0 [15:49] If you have a store for unstructured info, odn't need new store for every new type of structured info [15:51] Mark: gets back to old comp sci problem -- better to describe things declaritively rather than procedurally [15:51] schemas are towards the declaritive end [15:53] diagram -- defined declarative description metadata, enter it, store, search / map between descriptions / display [15:54] David: Can either have automated validation, or have humans validate (as is the current case) [15:55] MacKenzie: In DSpace, metadata is deposited directly by end-users [15:55] Too restrictive to have all metadata validated by Library staff [15:56] David: dumb validation -- could just kick up violations to staff [15:57] Eric: What about the content creation aspect? [15:57] MacKenzie: part of the use case is the content creation aspect [15:58] much may be imported by bulk from other systems, but will not always be the case [15:58] Mark: Are interested in producing a generic system to multiple use cases, rather than one use case involving legacy systems [16:00] People have problems creating these schemas -- often incorrect [16:01] David: one problem to let users create instance data -- to allow users to create schemas is different [16:02] Mick: errors in schemas have bigger impact, but more instance data creators than schema creators [16:03] Mark: Tools do exist for schema creation. Problems: standards are changing quickly; naive users do not understand the tools, resultant schemas difficult to understand [16:05] Eric: Like the boxes Mark drew... would be good to get full set of boxes, and order them [16:06] Mick: For first few months, do not need smooth schema creation tool [16:08] Mark: New idea (for me) here is declarative description of data [16:08] David: Don't know how to use the schema [16:09] MacKenzie: e.g. do you need a way in declarative description to say which elements to display to users and with what labels? [16:10] Kevin: Just need to dive in and experiment [16:11] Eric: Interesting aspect of SIMILE is, producing generic solution... should demonstrate saving in turn-around time [16:12] * Rob notes AndyS' confused look and invites addition to Rob's notes if they're incomplete :) [16:12] * AndyS is confused by Kevin's comments not the notes [16:13] David: do want sophisticated tools for producing schemas... but there is a tonne of hard stuff to work out around that [16:14] e.g. how it fits into workflow [16:14] could even just have raw RDF with no schema [16:15] can mine out things like authors etc. [16:16] When someone comes in with data, with RDBMS you have to know the schema and refactor [16:16] with RDF, you can accept any data even without the schema [16:16] (Link: http://rdfschema.info/)http://rdfschema.info/ but the webserver is 502'ing (bad gateway) [16:17] MacKenzie: Depends how messy we want to let the system become [16:17] David: It is a continuum.. there are options... [16:19] Different parties involved -- schema creators, instance creators, instance consumers [16:19] schema creators least numerous, instance consumers most numerous [16:20] for instance consumers, if data does not validate to schema may still be useful [16:21] *** Signoff: em (Quit: Client exiting) [16:22] *** em-lap (em@18.29.0.30) has joined channel #simile [16:22] Martin: For interoperability/transformation, need to be able to express relationships between schemata -- this will introduce constraints. We don't understand the issues around this yet, so we should hold of on schema editing tools until we know more about them [16:23] Mark: We can build something quickly, and test it out and discuss it -- it might not work but we will learn this way -- advocate rapid prototyping approach [16:25] David: this is breadth first vs depth first [16:26] Mick: Building things gives you something concrete to talk about [16:30] MacKenzie: reader's requirements haven't changed much, ability to fulfill them is being improved... can improve situation for metadata creators [16:35] David: There are several classes of user within a use case... so do we abandon all but one class of user or cover the different classes (multiple use cases?) [16:38] Mick: Need storyboards for definers, creators and readers that are consistent [16:40] Mark: There is a stab at this in the use case doc (section 2) [16:42] * Rob apologises for diminishing quality of scribing towards the end of that session *** You have been marked as being away. *** You are no longer marked as being away. [16:53] *** Signoff: em-lap (Ping timeout) [17:00] *** em-lap (em@18.29.0.30) has joined channel #simile [17:05] * Rob asks that someone else scribe [17:05] *** afs (afs@18.99.1.126) has joined channel #simile [17:06] *** Signoff: AndyS (Connection reset by peer) [17:11] *** kevinS (chatzilla@18.99.1.100) has joined channel #simile [17:11] 1. not applicable [17:12] 2. applies, 3. does not [17:13] assess priorities by common requirements on remaining items [17:13] see mick's spreadsheet for details of requiremetns.