minutes from Wednesday Morning meeting, SIMILE f2f

MS: Being able to create instances, create rules for how this metadata
should be searched and viewed, controlled vocabularies that should be
applied, all that stuff. So the flip side of my goal to better
infrastructure is my knowledge of digital libraries is they are pretty
skeptical of RDF and the Semantic Web. A lot of people thought it was a
brillant idea, and they tried to implement solutions, and they got burnt big
time. We have a system that works, a lot of people are using it work, and if
we say here is a new version which is semantic web it is unlikely that
people will adopt it. This is a problem that is so real is people are
already looking at DSpace is they are looking at a plug-in that supports a
different schema apart from DC. So we need to do it better, and thats a
really nice test of this.

Mick: Better is measured by cost?

MS: Yes, people are starting to build plug-ins for learning object schemas.
So we can do a side by side comparison of the two.

So the risk is that people will say we can already do that, there's no
point, go away.

A lot of them would like to be able to run multiple schemas. In the first
case, they want to replace what we are doing with a different schema. They
can run different instances for each different schema. We may have to do
that, as we promised to support the IMS learning objects schema this year.
In addition I know they would like to be able to support multiple schemas at
the same time.

EM: Another thing that would be interesting would be to develop things that
are useful in the museum and the bioinformatics community etc. You build up
this deployment base.

MS: Because DSpace is being widely adopted, it is being developed, different
universities are adding features, so it needs to be backwards compatible and
we need to be sure that we are not asking people to throw away two or three
years of development here. 

MS: The DSpace federation will be adding features, we need to consider them.
OCLC just created some DC creation templates, we want to use that, but there
is a danger that could be thrown out if we used RDF.

JE: If the adopter community is going to adopt this, they ought to have a
stake here. 

MS: They will adopt it if it is better, as long as they don't have to redo
all the work they have done over the last three years. And warn them if
necessary.

Wednesday morning second session
 
Martin: Looking at the plan, at this level of abstraction its going to look
like that, but for me the key thing is knowing which detailed use case we
are going to do over the next few months. We need to nail down the use case
and everything follows from that. A use case that has users.

Kevin: The use case that has the most interest from HPs viewpoint is visual
image support. Both the learning objects use case and the visual image use
case are relevant, we talked quite a bit about it yesterday, perhaps we
should talk about it again today as we had some questions where we didn't
have the answers. Same for the visual image use case. 

Rob: We didn't get where the heterogenity came in with the learning object
use cases.

MS: We have a couple of faculty who would like to put visual images into
DSpace, but the library would prefer to use VRA core. This is a four level
hierarchy schema, that starts with an abstract work and drills down to
specific surrogates for that. So there are data entry systems for that
schema, so I could get real records from MIT and other institutions for that
schema. So I could get records, some with digital images attached, so
without. So the pressing need is we would like to use these images in
DSpace, and not dumb them down to DC.

Kevin: Who are the other actors:

MS: Instructors for use in class, research and teaching, not to be used in
publications without permission, one good example is the Japanese image
collection. That would be a collection we could start of with where there is
a standard schema, its already mapped to DC, some of the attributes sound
familiar, its an easy low hanging fruit type schema,

KS: Is it just translating between DC and VRA? Or other schemas?

MS: In the future, we'll have other resources that only have DC attached. We
may want to search across them, I'm not quite sure how to do this. 

EM: A few points. When you say this is of interest to HP, do you mean images
in general, or just to this specific community?

KS: not specific.

EM: The visual aspect is very stunning, makes for cool demos. I worked with
the ? group for a while, looking at how to model it in RDF. We worked on the
nighthawks collection, it can be decomposed into thousands of object, since
Amica has a specific way of representing it which is VRA core but separate,
so it would be interesting to have the visual side of this talk about
getting different collections. It doesn't offer the 3 or 4 types of
heterogeneity, but it does help about how different communities try to do
this. 

The first year is going to be the most critical, to make the case to the
larger community that there is interesting stuff here, so having a fairly
large collection of content that can test some of aspects of genesis, of
some of the inferencing capabilities etc. 

KS: I was concerned about the image use case, due to the availability of
metadata. Isn't the most valuable information the association between the
image and the metadata, so unless you have someone who needs this project to
get their images online aren't their political issues here?

MS: During the research phase, it will be easy to convince people to give
metadata. We wouldn't want to put up a production catalogue of that. MIT has
a large collection of 100,000+ images. We probably only have metadata for
40,000 images. 

EM: This is the 40,000 vra core. We are in the process of bringing up a new
VRA core cataloguing system.

Mick: I want to explore heterogentity / interoperability. DC and VRA core
seems clear, Eric mentions a different one e.g. here is the concept of this
thing, here are instances of it. The example you mentioned is images related
to Japanese history,.

MS: In the Japanese collection, people might look differently from a
historical perspective. 

EM: If we got additional information from another source, we can overlay the
information. 

Mick: From an approach perspective I'm cool with that.

EM: It's not that we can't load heterogeneous collections in RDF, its that
we need to tie them together for it to be important.

One point I thought was interesting is who are the consumers of this. For
example if people are going to create learning objects from this, 

MS: This is a good point. There may be IMS data as well.

EM: So just helping people understand that type of association is something
that will help.

DK: You asked the question if we load this stuff in different ontologies,
what's the point, it might be that someone else may need to connect them
together.

We don't have to solve everything.

EM: There is an aha effect when people see the connections.

DK: Theres an declarative connection, but there are other connections and
interrelations that are possible.

Mick: You were going to describe the landscape for learning objects.

MS: The problem we have is there is no systems for creating and exploring
learning object data. Its a standard but it is not widely deployed. Everyone
has agreed that the IMS profile will be the standard,  but not a lot of
people are using it. So we are creating objects in the Microsoft Content
Management system, and we have some money to get that information back out
in XML format, then I've promised to look at how to ingest them into DSpace.
So if there is no SIMILE solution, then I have to write a plug-in for
DSpace. So I need to be get learning objects in, support the schema, explore
the information etc. 

Mick: So support means being able to import the content and metadata, search
the metadata. What's the level of modelling here - a whole course?

MS: typically its an atomic thing e.g. a lecture note. But there is a notion
that we should be able to take in an entire course, so that a faculty member
can take an entire course and drop it into the course management system.

Mick: So what have OCW committed to?

MS: Staff in India, and some library staff are creating the IMS metadata
elements. There is no way to get that information back out, apart from the
OCW html pages. 

Mick: So there is a publishing process here involving the creation of
metadata. Are they breaking it down to the level of lecture notes etc?

MS: Yes we determine which are first class items, they inherit metadata from
the course as a whole, but they also get some of their own, some things like
calendar don't need extra metadata, but things like readings need extra
metadata. In OCW they strip off the readings for legal reasons, its on the
course website but not in OCW, but in most cases there is an object and
metadata for it.

Martin: Who uses this and how?

MS: Teachers all over the world who want to use the course materials in
their course. Its a catalogue of learning objects.

JE: One of the things in the course of our discussion yesterday, is it could
be embedded pieces e.g an image in a course note. So to make them available,
do you take that out of the context of the presentation?

MS: For purely practical reasons we are not doing that. Everything that goes
up on OCW is publically available, so we need to treat everything as one IP
package. We just defined what type of objects count on a course website, and
those objects get metadata. 

We've already sat down with the Sapient people, and a lot of what John
describes isn't happening.

KS: Can we walkthrough the data flows in the OCW system?

MS: The metadata that ends up in OCW is compliant with the IMS metadata set,
but its not encoded in any standard way.

EM: There are various serialisations of this, e.g. CDIF, XML, XML/RDF etc.

MS: The only way you can get content out from OCW is to package it up as
HTML and a lot of the metadata doesn't come through.

The CDIF format is the way they went from the course management systems to
the OCW system.

The problem here is there are a whole chain of standards here e.g. SCWM,
IMS, then specific versions of that. It has elements that are useful in an
educational context.

We get to define the output of this, it will probably be the XML binding of
IMS. 

EM: There is an XML schema for the LOM. There is also an RDF/XML version. I
don't know how the approval process works. 

MS: The SCWM specification includes packaging for the metadata, so I assume
we will use this as it is a standard.

Stellar isn't capturing that much metadata, but we would still want to use
IMS, it just wouldn't have that much metadata.

One of the requirements of this Microsoft project is that faculty would like
to round-trip stuff from OCW to Stellar so they can re-run courses.

KS: So the OCW has the open version of the course, someone goes to that
website, wants to reuse the course, what do they do?

MS: We haven't worked through this, they could go back to the CMS. 

EM: Depending on where we want to do this, we may want to do some kind of
translation. So one discussion is how far we want to push that upstream. 

KS: The reason I've been putting the conversion down to SIMILE is this is
something it is meant to do. The other systems can't do RDF conversions. 

AS: Formats and schemas are different things though.

So one thing is the heterogeneity of the schemas, but is the data here all
IMS?

MS: This is one schema. The VRA is another. Today we deal by turning that
into DC. Lowest common denominator.

This is a real scenario, we want to search across these different types of
metadata. 

Mick: What are the facilities to help people find what they need on the OCW
website?

MS: Not very sophisticated.

Mick: I'm trying to understand the relationship here. 

MS: We are creating metadata that OCW aren't using. We have promised to be
the archive of this. At the moment faculty members can download content from
the OCW website, but there is no metadata there. 

MS: That system isn't designed as a long term archive. 

Mark: It has no dissemination component apart from HTML, nothing for the
metadata.

MS: OCW may not be there in five years.

EM: MIT libraries has established itself as a clearing house, OCW may go
away. This is a way of managing a website.

MS: It could have been modified, but that seems stupid when we have DSpace.

There are no promises about the OCW website being there in the future. What
we promised them is that the content will still be there. We need for people
to be able to search and retrieve those websites, it doesn't need to look
the same, but the same information needs to be there.

Mick: So does content go from OCW to Sloanspace?

MS: No it goes via DSpace / SIMILE?

EM: Can offer a paraphrase of what Hal said? They haven't got to the point
of exchanging information. They have put in place a whole policy here e.g.
rights so it provides a buffer about this particular discussion. So while
DSpace it might not be center, they have put in place policies that do that
kind of stuff.

MS: So the only arrow that exists here is the arrow that goes from Stellar
to OCW. That's it. The Microsoft funding will pay for a lot of this. It's
all in current in DSpace technology.

Mick: Will OCW make a landgrab to do archival?

MS: The reason they went with the Microsoft system is they had such an
extreme time line. They know the funders worry about long term access, we
don't have to worry about turf wars here.

Mark: So there is a difference in approaches here: we could either look at
the entire ecosystem, and try to get SIMILE to solve all the problems, or we
could just concentrate on how to get SIMILE to solve its part of the problem
so we can say this is how the whole system should do it.

MS: What we need are the functional requirements about what people need to
do with this information in SIMILE - what do they search, what do they want
to see, etc? And this is the distinction between the SIP, the DIP and AIP's.


Martin: So are we likely to want to search for learning objects or across
the whole library archive?

So we can clone DSpace, use it search IMS. That's half the problem. But
searching across the library holdings, is that something you need to do
upfront?

MS: I haven't promised them that, it would be good if we could do that. 

It would be nice if you could search everything and get it back as a
learning objects.

EM: Anything can be a learning object right? 

MS: There are some mappings that don't make sense. 

Microsoft wants a way to export content from its system, so we specify it
and then they do the hard grunt work on it.

?: So it sounds like that 

EM: Do you view the Amica data and the VRA core data as they same?

KS: So the CDIF data is available now?

MS: It lacks about 90% of the metadata, its all getting added later.

Martin: I'd like to look at the time that IMS data comes in. The thing that
transforms this to a real use case for SIMILE is the possibility of doing
the 3 way transformation. 

EM: We could fake this. There are other ways of going about this that help
us prepare for the OCW stuff. We 

MS: People have low expectations for searching against learning objects,
high expectations for visual images.

There is a larger audience for images, e.g. students whereas learning
objects is aimed more at instructors.

EM: Are there other groups at MIT creating image collections?

MS: There is an art and architecture visual image collection that has just
been migrated to VRA core. Then there is the Japanese collection. If I talk
to Kirk Vendt, they have a thing called metamedia with large image
collections but no VRA core. There will be a lot more over the next year.
Outside MIT there are large collections of images and metadata.

EM: AMICO is one, at Toronto now. 

MS: There is also the CIDOC people. Harvard has a collection we could borrow
as a test collection. 

Received on Thursday, 24 July 2003 15:00:28 UTC