SIMILE work

Hi Steve

As I mentioned in my previous email, we are interested in you getting
involved in three areas of SIMILE tasks:

> - the construction of prototype viewers for the 
> Artstor VRA, OpenCourseWare IMS, possibly CIDOC, 
> and history system data using Haystack.
> - David Karger said that work has been done previously 
> on enabling Haystack to generate a UI via a web browser. 
> We are interested in a browser UI version of Haystack
> - the demonstration of these viewers using the browser based UI 

So I guess the first think is to check with you that these tasks seem
reasonable?

It should be possible to start on the first task now as the Artstor data is
available - see 

http://lists.w3.org/Archives/Public/www-rdf-dspace/2003Nov/0003.html

also you may be interested in examining the Artstor schema and the XSLT
transform used to create the Artstor data are in the CVS

/simile/corpus/artstor/artstor.xsl
/simile/corpus/artstor/vra-schema.n3

Note the Artstor schema is in N3, you may want to use Jena to convert it to
RDF/XML if you are more familiar with this.

I think the IMS data is also at a stage where it should be possible to start
work on viewers - Kevin, is this correct? If so what is the best way to make
it available to Steve?

The next question is "what should the viewers look like?" Well I'm
interested from other suggestions from the team, which we can discuss
tomorrow, but I would like to explore the possibility of using faceted
browse (also called faceted search) which you may already be familiar with -
if not see
http://www.siderean.com/MedJournalDemo001_viewlet_swf.html 
http://bailando.sims.berkeley.edu/talks/hearst-pcd-seminar-2003.asx
http://bailando.sims.berkeley.edu/flamenco.html

Looking at the Artstor data, my guess is the key data fields are

- creator
- geographic
- subject
- title
- period

So we could group images by creator, geography and period. Grouping by
subject is more complicated as the subject property is both used for
indexing terms and a description of the image e.g. 

<http://web.mit.edu/simile/metadata/artstor/id#UCSD_41822000005155>
  vra:subject
<http://web.mit.edu/simile/metadata/artstor/subject#Vézelay_(France)--Ste._
Madeleine>,
	<http://web.mit.edu/simile/metadata/artstor/subject#Romanesque> ,

	<http://web.mit.edu/simile/metadata/artstor/subject#Saints> , 
	<http://web.mit.edu/simile/metadata/artstor/subject#Capitals> , 
	
<http://web.mit.edu/simile/metadata/artstor/subject#Eustace,Saint,_martyr,d.
_118> ,
<http://web.mit.edu/simile/metadata/artstor/subject#Architectural_sculpture>
;

here the first property value is a description whereas the others are
indexing terms. Now to distinguish between these two uses of subject we need
to look up each term in a thesauri such as the Getty AAT to determine if it
is an indexing term. However it may not be possible to do this within the
time limits of the demo, so one possible workaround is to say that an
indexing term is one that has more than a single instance in the corpus.
This has the unfortunate side-effect that it will disregard valid indexing
terms, but in the demo we are trying to convey the value that using
structured metadata can deliver through faceted browsing. 

Note we have had some discussion about the canonicalization of creator names
(see the email list archive), and it is likely that canonicalization is
necessary on other properties - feedback on this is welcome?

Does this seem sufficient information to start to make progress? If you have
any questions or queries please send them to the SIMILE public list?

thanks, kind regards

Dr Mark H. Butler
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Wednesday, 5 November 2003 08:30:07 UTC