Session Start (irc.w3.org:#simile): Wed Jul 23 13:34:44 2003
[13:34] *** Now talking in #simile.
*** Users on #simile: Rob marbut @AndyS 
*** End of /NAMES list.
*** Your user mode is "+i"
*** Mode for channel #simile is "+"
*** Channel #simile was created at Wed Jul 23 13:33:30 2003 
[13:34] *** Join to #simile completed in 4 seconds.
[13:34] <AndyS> e.g. meiko
[13:35] <marbut> amico,see (Link: http://www.amico.org/,)http://www.amico.org/,  The Art Museum Image Consortium (AMICO)
[13:35] <AndyS> Issue is identifying a collection we can use for SIMILE
[13:36] <AndyS> seeAlo: CIDOC, Harvard collection
[13:37] *** em-lap (em@18.29.0.30) has joined channel #simile
[13:37] <AndyS> Mackenzie: What is the minimum amount of data?
[13:38] <AndyS> All: 40K records to start should be OK
[13:38] *** KevinS (chatzilla@18.99.1.100) has joined channel #simile
[13:38] * AndyS asks em about a logger
[13:39] <em-lap> hmm...
[13:39] * Rob 's client is logging
[13:39] <AndyS> Mick: Next: what packaging formats of images metadata is implied?
[13:39] <AndyS> em: METS etc
[13:39] * em-lap not sure how to use new client... /invites dont seem to be working
[13:40] <em-lap> can one of you type '/invite rrsagent'
[13:40] * AndyS has logging locally
*** Inviting RRSAgent to channel #simile.
[13:40] *** RRSAgent (rrs-loggee@18.29.0.30) has joined channel #simile
[13:40] * RRSAgent is logging
[13:40] * AndyS done
[13:40] <em-lap> hmm.... maybe it worked?
[13:40] <em-lap> just took a while...
[13:40] * AndyS is @AndyS
[13:40] <Rob> I just typed that cmd 
[13:40] <em-lap> ah
[13:40] <em-lap> thanks RRSAgent
[13:41] <em-lap> err.. Rob :)
[13:41] <AndyS> SIPs: METS, VRAcore
[13:41] * em-lap thinks he's not a fan of this clinet
[13:41] <AndyS> History collection - many images, not so much metadata
[13:42] <AndyS> 1/ Constituent has cataloguing records - submit to SIMILE
[13:42] <AndyS> 2/ Just images (or very little metadata)
[13:42] <AndyS> 3/ Images+ records
[13:43] <AndyS> 4/ Add metadata to existing images
[13:43] <AndyS> For images UC, there are no clear set of users 
[13:44] <AndyS> Mackenzie: today, phsyical slides used
[13:45] <AndyS> ... in teaching.  Go back to the department collection.
[13:46] <AndyS> There is an online catalogue for finding the physical slides.
[13:47] <AndyS> Physical media - need to checked out.
[13:47] <AndyS> s/to/to be/
[13:48] <AndyS> OCW causes digitialisation as its a web site!
[13:48] <AndyS> 5/ Digitize and add metadata to existingcatalogue record
[13:49] <AndyS> Bulk import to seed then one by one additions
[13:50] <AndyS> KS asked: answer is that VRA can have support for thumbnails. 
[13:51] <AndyS> VRA: Group-Work-Surrogates-Details
[13:51] <AndyS> Surrogates can be at work or detail level (or even group)
[13:53] <AndyS> Mick: Any AIPs? 
[13:53] <AndyS> Eric: Most at image level
[13:53] <AndyS> rather than collections
[13:54] <AndyS> Mackenzie: assume we deal withn the work (for now) - not the collection - for AIPs
[13:54] <AndyS> Lots of surrogates.
[13:55] <AndyS> With metadata like dimensions 
[13:56] <AndyS> (Link: http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=/html/VIA.html:style=via)http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=/html/VIA.html:style=via
[13:57] <AndyS> (Link: http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=html/VIAhomeframe.html:style=via)http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=html/VIAhomeframe.html:style=via
[13:57] <AndyS> the site search
[13:57] <AndyS> A record : (Link: http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=html/VIAhomeframe.html:style=via)http://via.harvard.edu:748/WebZ/Authorize?sessionid=0:next=html/VIAhomeframe.html:style=via
[13:58] <AndyS> (search for "Hewlett")
[14:00] <AndyS> Shows examples of displays of VRA records
[14:02] <AndyS> In practice people do "google" style searches - words in first box only
[14:05] <AndyS> The exact usage of collections/groups/etc allows for variance in usage
[14:06] <AndyS> As preexisting data, complex to build the single view for the website
[14:09] <AndyS> The metadata is in "Site Search" - an OCLC system (pre-XML almost)
[14:10] <AndyS> Moving to Tamino this year
[14:11] <AndyS> Issue: copyright on untracable materials - rather hard
[14:14] <AndyS> The original metadata is not preserved - its just a means of access but ...
[14:14] <AndyS> ... the technical metadate can be important.
[14:16] <AndyS> dozen repositories (museums and archives)
[14:19] <AndyS> em: After Site Search, moved onto more RDF-like, not predefined access paths
[14:20] <AndyS> Mackenzie: thumbnails are usually free
[14:21] <AndyS> 200K images today.
[14:22] <AndyS> KevinS: Any collections of video streams?
[14:22] <AndyS> Mackenzie: not really at Harvard - would not be VRA
[14:23] <AndyS> Others exist but don't have good metadata (minimal, like lecture number for lecture videos)
[14:24] <AndyS> Example: (Link: http://MITworld.mit.edu/)http://MITworld.mit.edu/
[14:25] <AndyS> Example: Oyez - supreme court votes and discussions 
[14:27] <AndyS> Mick: want something achievable and interesting 
[14:27] <AndyS> Video afterwards!
[14:29] <AndyS> JSE finds example where field reuse occurs depending on item type - Mackenzie says it is common
[14:31] <AndyS> Each collection will want additional metadata as well as std like VARcore
[14:31] <AndyS> s/VARcore/VRAcore/
[14:32] <AndyS> Many stds have hooks for domain-specific hooks or profiles
[14:33] <AndyS> Mackenzie: heterogeneity of schemas and of application rules / controlled vocabularies
[14:34] <AndyS> e.g. VRA+these content rules (e.g. the mening of "date") - use XYZ for subject field, etc etc
[14:34] <AndyS> Need to capture content rules
[14:35] <AndyS> (em speaking)
[14:36] <AndyS> Mark: History system used own URIs with links back to std (eg. HarmonyABC)
[14:37] <AndyS> No encoding rules for value of a propery (e.g. max value of a field)
[14:39] <AndyS> em: continuum from human description to machine processable rules
[14:40] <AndyS> Many: OWL does some of this
[14:40] <AndyS> but not all - need data cleaning, ingestion specific processes and general mechanisms
[14:43] *** Signoff: em-lap (Ping timeout)
[14:44] <AndyS> Issue: coding data entry systems is costly
[14:45] <AndyS> Example: subject vocabularies
[14:45] <AndyS> Example: laying out names of authors
[14:46] <AndyS> Dates: date of picture, digitisation, ingestion etc etc
[14:50] <AndyS> Possible task: someone survey the possible data testing tools we could employ
[14:51] <AndyS> Need to check on ingestion that the data is "good"
[14:57] <AndyS> Mark wishes to understand why we need more than we saw on web
[14:57] <AndyS> Eric/Mackenize: it took 2 years per system
[14:58] <AndyS> If just wanted VRAcore then could replicate.  But new schemas come along all the time.  And they evolve
[14:59] <AndyS> Desire (library community) for machine processable content rules.  Execute on ingestion.
[15:00] <AndyS> Different schemas for technical details of images e.g. ISO std
[15:00] <AndyS> Different schemas for different types
[15:01] <AndyS> ISO still image format
[15:03] <AndyS> DIPs: catalogue records, images, thumbnails of items.
[15:04] <AndyS> Packaging for export : need to insert into other systems
[15:04] <AndyS> Different schema from SIP.
[15:08] <AndyS> Next: other schemas: images in with VRA core and want to add other metadata
*** You have been marked as being away. 
*** You are no longer marked as being away. 
[15:28] *** em (em@18.29.0.30) has joined channel #simile
[15:28] *** Zakim (rrs-bridgg@18.29.0.27) has joined channel #simile
[15:28] <em> zakim, please leave
[15:28] *** Zakim has left #simile
[15:30] * AndyS asks for someone to scribe the next section
[15:31] * Rob will do it
[15:31] * AndyS thanks Rob
[15:31] *** Signoff: KevinS (Client exited)
[15:32] <marbut> Eric mentioned some work had been done on representing the National Cancer Institute's thesaurus in OWL. There is an interesting paper about this here (Link: http://www.mindswap.org/papers/WebSemantics-NCI.pdf)http://www.mindswap.org/papers/WebSemantics-NCI.pdf
[15:32] <Rob> part of the workflow with DSpace is someone goes to MIT Libraries and says "we want to use schema X"... do you get machine-processable schema representations? 
[15:32] <Rob> [Mick ^^^] 
[15:32] <Rob> MacKenzie: usually XML schemas 
[15:34] <Rob> usually MIT Libraries will say e.g. "you need to use VRA Core" 
[15:34] <Rob> community will want to add a couple of extra fields 
[15:34] <Rob> occasionally people will say "please archive this database" 
[15:34] <Rob> there are a number of scenarios 
[15:34] <marbut> (Also see (Link: http://www.mindswap.org/2003/CancerOntology/)http://www.mindswap.org/2003/CancerOntology/)
[15:34] <Rob> Mick: In this use case what is machine-processable format that expresses the schema? 
[15:34] <Rob> XML Schema? RDF Schema? 
[15:35] <Rob> Might be a task for the libraries 
[15:35] <Rob> MacKenzie: noone in libraries to produce XML schemas 
[15:36] *** em has changed the topic on channel #simile to Simile PI meeting
[15:37] <Rob> David: We are building a system where community can enforce there own "content rules" (for want of a better term)? 
[15:38] <Rob> Mark B: Variety of validation and schema expression techniques...  e.g. OWL is best for expressing relationships between schemas 
[15:38] <Rob> Need to decide which mix to go for 
[15:39] <Rob> There are also moves to escape this need... e.g. CNRI work on automatic configuration, RDDL 
[15:39] <Rob> (feel free to correct my notes if incorrect) 
[15:41] <Rob> Diff people use different terms in different contexts, e.g. John G has to describe use of these terms (ontology, schema, vocab. etc) 
[15:41] <Rob> David: Back to how to get schema into system... standardised mechanism for this?  Likely to be rare, at least initially -- so no need to focus on that now 
[15:42] <Rob> Martin: A win if it takes less than two years! 
[15:42] <Rob> Mark: Not jsut registering schema -- need to support ingest, UI, workflow etc. 
[15:43] <Rob> David: Currently human beings usually do validation etc. -- win if we can make it easier for a human 
[15:44] <Rob> MacKenzie: Usually Margret (MIT Library faculty liaison) negotiates all this with community representative 
[15:44] <Rob> Would be nice to normalise this process / information 
[15:45] <Rob> Mick: core part of demo that it's easier to produce this normalised version of the schema and introduce it, rather than produce a load of custom code 
[15:46] <Rob> Kevin: WOuld like to know what e.g. VRA-core specific code is doing 
[15:48] <Rob> David: even without schema -- just having lots of RDF is useful.  Do not want it to be central to work that we figure out how to accept schemas 
[15:49] <Rob> (playing devil's advocate0 
[15:49] <Rob> If you have a store for unstructured info, odn't need new store for every new type of structured info 
[15:51] <Rob> Mark: gets back to old comp sci problem -- better to describe things declaritively rather than procedurally 
[15:51] <Rob> schemas are towards the declaritive end 
[15:53] <Rob> diagram -- defined declarative description metadata, enter it, store, search / map between descriptions / display 
[15:54] <Rob> David: Can either have automated validation, or have humans validate (as is the current case) 
[15:55] <Rob> MacKenzie: In DSpace, metadata is deposited directly by end-users 
[15:55] <Rob> Too restrictive to have all metadata validated by Library staff 
[15:56] <Rob> David: dumb validation -- could just kick up violations to staff 
[15:57] <Rob> Eric: What about the content creation aspect? 
[15:57] <Rob> MacKenzie: part of the use case is the content creation aspect 
[15:58] <Rob> much may be imported by bulk from other systems, but will not always be the case 
[15:58] <Rob> Mark: Are interested in producing a generic system to multiple use cases, rather than one use case involving legacy systems 
[16:00] <Rob> People have problems creating these schemas -- often incorrect 
[16:01] <Rob> David: one problem to let users create instance data -- to allow users to create schemas is different 
[16:02] <Rob> Mick:  errors in schemas have bigger impact, but more instance data creators than schema creators 
[16:03] <Rob> Mark: Tools do exist for schema creation.  Problems: standards are changing quickly; naive users do not understand the tools, resultant schemas difficult to understand 
[16:05] <Rob> Eric: Like the boxes Mark drew... would be good to get full set of boxes, and order them 
[16:06] <Rob> Mick: For first few months, do not need smooth schema creation tool 
[16:08] <Rob> Mark: New idea (for me) here is declarative description of data 
[16:08] <Rob> David: Don't know how to use the schema 
[16:09] <Rob> MacKenzie: e.g. do you need a way in declarative description to say which elements to display to users and with what labels? 
[16:10] <Rob> Kevin: Just need to dive in and experiment 
[16:11] <Rob> Eric: Interesting aspect of SIMILE is, producing generic solution... should demonstrate saving in turn-around time 
[16:12] * Rob notes AndyS' confused look and invites addition to Rob's notes if they're incomplete :)
[16:12] * AndyS is confused by Kevin's comments not the notes
[16:13] <Rob> David: do want sophisticated tools for producing schemas... but there is a tonne of hard stuff to work out around that 
[16:14] <Rob> e.g. how it fits into workflow 
[16:14] <Rob> could even just have raw RDF with no schema 
[16:15] <Rob> can mine out things like authors etc. 
[16:16] <Rob> When someone comes in with data, with RDBMS you have to know the schema and refactor 
[16:16] <Rob> with RDF, you can accept any data even without the schema 
[16:16] <AndyS> (Link: http://rdfschema.info/)http://rdfschema.info/ but the webserver is 502'ing (bad gateway)
[16:17] <Rob> MacKenzie: Depends how messy we want to let the system become 
[16:17] <Rob> David: It is a continuum.. there are options... 
[16:19] <Rob> Different parties involved -- schema creators, instance creators, instance consumers 
[16:19] <Rob> schema creators least numerous, instance consumers most numerous 
[16:20] <Rob> for instance consumers, if data does not validate to schema may still be useful 
[16:21] *** Signoff: em (Quit: Client exiting)
[16:22] *** em-lap (em@18.29.0.30) has joined channel #simile
[16:22] <Rob> Martin: For interoperability/transformation, need to be able to express relationships between schemata -- this will introduce constraints.  We don't understand the issues around this yet, so we should hold of on schema editing tools until we know more about them 
[16:23] <Rob> Mark: We can build something quickly, and test it out and discuss it -- it might not work but we will learn this way -- advocate rapid prototyping approach 
[16:25] <Rob> David: this is breadth first vs depth first 
[16:26] <Rob> Mick: Building things gives you something concrete to talk about 
[16:30] <Rob> MacKenzie: reader's requirements haven't changed much, ability to fulfill them is being improved... can improve situation for metadata creators 
[16:35] <Rob> David: There are several classes of user within a use case... so do we abandon all but one class of user or cover the different classes (multiple use cases?) 
[16:38] <Rob> Mick: Need storyboards for definers, creators and readers that are consistent 
[16:40] <Rob> Mark: There is a stab at this in the use case doc (section 2) 
[16:42] * Rob apologises for diminishing quality of scribing towards the end of that session
*** You have been marked as being away. 
*** You are no longer marked as being away. 
[16:53] *** Signoff: em-lap (Ping timeout)
[17:00] *** em-lap (em@18.29.0.30) has joined channel #simile
[17:05] * Rob asks that someone else scribe
[17:05] *** afs (afs@18.99.1.126) has joined channel #simile
[17:06] *** Signoff: AndyS (Connection reset by peer)
[17:11] *** kevinS (chatzilla@18.99.1.100) has joined channel #simile
[17:11] <kevinS> 1. not applicable
[17:12] <kevinS> 2. applies, 3. does not
[17:13] <kevinS> assess priorities by common requirements on remaining items
[17:13] <kevinS> see mick's spreadsheet for details of requiremetns.