- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Wed, 19 Nov 2003 14:05:00 -0000
- To: www-rdf-dspace@w3.org
Vineet, David, Haystack team > I tried out the SIMILE datasets yesterday. My system is > designed to be > general purpose, and as expected the navigation worked as per my > expectations. Some issues cropped up in other parts of Haystack, as > Prof. Karger mentioned in his e-mail earlier today. First thanks for taking a look at the data. I'm keen to fix the issues as soon as we have consensus - please see my other email? > Otherwise, the only > effort involved in getting SIMILE data running in Haystack involved > converting the schema to rdf and loading the data into Haystack. I guess you mean RDF/XML - its in N3 at the moment which is still RDF right? When I have time, I'd like to create an proper automated build process for SIMILE, so where files adopt a particular canonical format but other formats are required by the team it is possible to build these files automatically. For a long time, the CVS has only had a few users in HP. So now we need to do some reorganisation to support the whole team better, but at the moment I'm busy on other things, particularly the demo script. > I am however interested in more detailed expectations of how > you would > expect the system to perform on the SIMILE dataset. If > possible, I would > like one or two, detailed, click-by-click, scenarios of a person > browsing the system. I want to make sure that I have not made any > trade-offs while designing a general-purpose system. I'm currently working towards this on the demo script, however I will explain my current thinking here - feedback is very welcome. A while back Kevin and I did some work on identifying overlaps between the two data sets - see http://lists.w3.org/Archives/Public/www-rdf-dspace/2003Oct/0108.html So it would be interesting to load the IMS and the Artstor data into Haystack, then see if browsing on any of these terms returned records from both. This requires using RDFS or OWL to map between the two sets of data - is this easy to do in Haystack? The other possibility is to read some IMS data and some Artstor data into Jena, then load the schema and run an inferencer, then serialize the data back out and read it into Haystack. The other issue here is we need to decide on the map, and also fix a few remaining inconsistencies between the two datasets. For example in artstor, the term "telescope" might be mentioned in two places <http://web.mit.edu/simile/metadata/artstor/id#UCSD_41822000860534> vra:subject <http://web.mit.edu/simile/metadata/artstor/subject#telescopes> ; vra:typeAAT "telescopes" . whereas in IMS it might be mentioned here <ocw:OcwWeb/Earth--Atmospheric--and-Planetary-Sciences/12-409Hands-On-Astron omy--Observing-Stars-and-PlanetsSpring2002/CourseHome/index.htm> dc:subject "planets" , "spectroscopy" , "stars" , "moon" , "telescopes" . In the Artstor data, we are turning telescopes into a URI as it is a controlled term based on a suggestion from Eric, although as I've noted before subject isn't always used this way - for more discussion see http://lists.w3.org/Archives/Public/www-rdf-dspace/2003Oct/0114.html Another overlap is Matthew Calbraith Perry e.g. in IMS <http://ocw.mit.edu/NR/rdonlyres/6581A505-899A-498F-9754-6EAD461BDA44/0/01_t itlepage_s.jpg> dc:contributor <ocwc:Perry%2C%20Matthew%20Calbraith> . <http://ocw.mit.edu/NR/rdonlyres/F524752D-3926-4849-B2E2-4B3C66506440/0/18_X IV_28_093_s.jpg> dc:description "Portrait of Perry, photograph by Mathew Brady; Matthew Calbraith Perry, daguerreotype by P. Haas" . whereas in ArtStor <http://web.mit.edu/simile/metadata/artstor/id#UCSD_41822003055447> vra:subject <http://web.mit.edu/simile/metadata/artstor/subject#Perry,_Matthew_Calbraith ,1794-1858> ; vra:title "Commodore Perry (left) and Captain Henry A. Adams, as seen by a Japanese artist" ; vra:typeAAT "Perry, Matthew Calbraith,1794-1858" . this case is clearly more complicated than the first one? Perhaps one way to determine the map is to just go through the list of overlaps, and build a comprehensive list of all the instances of shared terms between the two vocabularies, as then we will have a good understanding of the problem? comments here are very welcome, regards Dr Mark H. Butler Research Scientist HP Labs Bristol mark-h_butler@hp.com Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Wednesday, 19 November 2003 09:06:29 UTC