notes from discussion today on Artstor and VRA schemas

Hi team,

here are some notes I took today from a discussion between Andy S and myself
on the three different VRA schemas we have - the one I did before we got the
Artstor data, the one Andy did from the VRA documentation, and the one I did
based on the Artstor schema. 

Mark: One difference between the schemas is one uses "Creator" whereas the
other has a class called "Person". I think we need a class for FRBR class 2
entities. Creator is bad because it contains "role" which may indicate a
role other than creator. Person is bad because it doesn't allow for
organisations.

Andy: An example of the entity is architecture. Sometimes things are
credited to the company, not the person. E.g. Millenium Bridge is credited
to Ove Arup.

CONCLUSION: Change name to entity, which has subclasses person and
organization.

Mark: My schema for Artstor has a class called collection.  

Andy: VRA has two classes: Series and LargerEntity. Are these the same as
collection? There are two ways we consider LargerEntity: if we have an
object, and we only consider part of it. The other thing is to consider a
collection a larger entity.

Mark: LargerEntity is not used in Artstor.

Andy: Maybe Collection is a subclass of LargerEntity.

QUESTION: Input from team - what is the relationship between Collection,
Series and LargerEntity?

Mark: Instead of making Collection a class, it could just be a URI. All the
data we have is in a single collection. 

Andy: I think it should be a class, it sounds like Artstor have augmented
VRA here.

Mark: In the Artstor schema I have Work, Image, Mediafile. 

Andy: There is no concept of Mediafile in VRA. There are 3 distinct
concepts: the thing you are taking a picture of,
the concept, and the realisation. An Image in VRA is not a manifestation.

Andy: Also note there isn't a subclass relation between Image and Work. 

Mark: But there is a relation?

Andy: It's "is an image of". With books they have a concept of a
manifestation. The concept here will have another name?

Mark: In the Artstor collection they only talk about images.

Andy: Not sure I am comfortable about that. It's probably they've only got
one image of every painting. But when we put in the opencourseware, they may
find different images of the same subject matter e.g. multiple images of the
roof of the sistine chapel. 

Mark: what about record?

Andy: There is two things a record represents. The individual set of
statements that particular corpus contains about this thing. And this
generalisation of work and image. They are not the same. In a closed world
system they are the same, but
in the general case they are not when you merge to collections together
because you can have two records about the same work, so the VRA record
concept has put two things together. So I think we don't need it, unless it
has a very clear meaning, unless we want to consider providence. There are
things in VRA that apply to a work and image. For example Creator is used to
talk about who took the photograph, and who produced the work. 

M: Artstor only has records of type image.

A: Yes. They talk about creator of the work, but they get it wrong. The
image was not created by Leonardo da Vinci. So they implicitly have things
of type work but they are embedded in things of type image. To demonstrate
this you'd need two images of the same thing, but we don't think we've got
that.

Mark: What is the best way to deal with these additional classes and
properties? Create Artstor specific classes and properties, outside the VRA
namespace?

Andy: Yes.

Mark: I have materials, measurements and location as properties, but
pointing at URIs. In your VRA schema they are classes.

Andy: They are controlled terms. We can leave them as properties now, may
make them classes later for type checking. 

Mark: Also your schema is missing

#    Date.Creation
#    Date.Design
#    Date.Beginning
#    Date.Completion
#    Date.Alteration
#    Date.Restoration

Andy: There is a problem with these. I would rather we only dealt with
points in time. It's not clear if these are points or periods, and beginning
and completion are qualifiers.  So for any given date you can label it as a
beginning or completion. So for alteration, you can have a beginning and a
completion.

Mark: Can't find examples of alteration in the medium sample artstor corpus.

Mark: How about locationCurrentRepository vs idCurrentRepository? In Artstor
they just have one property that has the same meaning.

Andy: It is mixing it up between some kind of access identifier and the
notion of the repository itself. 

Mark: To be honest, I don't understand the difference between currentSite
and currentRepository.

Andy: Pictures have sites e.g. they are in the Louvre. 

Mark: Should I put metadataCreationDate and metadataUpdateDate in the
Artstor namespace?

Andy: Which metadata is it referring to? Really metadata should be a bag of
reified statements - for example is their accession date and metadata date
different?

Mark: In Artstor XML they have a metadata nested element - we flattened it
out.

Andy: Yes, we could separate it out e.g. have a metadata resource.

Mark: Do we want to review this?

Andy: If we are talking about the image, we can put that stuff on the image
with no problem. If we are talking about the work we are stuffed. Our
conclusion is part of the record is image related, part of it are work
related, but we are only encountering images so we don't need to break it
up. Also there is a whole issue is tying this to the history store. As the
other approach is to take this out altogether and put it in the history
store.

Mark: There is duplication between subject and type - they have the same
values. 

Andy: Also subject contain subject classification terms and subject
description. But we only want to give URIs to the first one. We need to look
them up in something like AAT first to check if we need to turn them into
URIs.

Mark: But does it do any harm to turn them all into URIs?

Andy: Yes. Synthesising a URI for subject descriptions doesn't help - its a
descriptive piece of text for displaying to users.

Mark: There may be some heuristics here.

Andy: look it up in AAT?

Mark: I think they don't have any spaces in if they are controlled terms.

Andy: If you do find spaces you know its not a controlled term, if you don't
you're not sure. For example it might have Caesar as a subject.

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Thursday, 23 October 2003 08:23:19 UTC