- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Fri, 24 Oct 2003 16:58:24 +0100
- To: SIMILE public list <www-rdf-dspace@w3.org>
Kevin and I just spent some time discussing IMS on the phone - here are the major points: 1. Different URIs for the same resource I made a mistake, in fact these URIs point to the same document <http://ocw.mit.edu/NR/rdonlyres/Urban-Studies-and-Planning/11-208Introducti on-to-Computers-in-Public-Management-IIJanuary--IAP-2002/C7D9370D-8BD3-48C2- B367-D4917E9BDC14/0/lect4.pdf> <http://ocw.mit.edu/NR/rdonlyres/C7D9370D-8BD3-48C2-B367-D4917E9BDC14/0/lect 4.pdf> so there must be some kind of redirection on the OCW website. Kevin noted that OCW itself uses the first URI, so we are probably better using that URI although its a bit longer. => Decision: use first version. I suggested we could simplify this in the N3 by adding a namespace definition in the RDF/XML like xmlns:ocwcontent='http://ocw.mit.edu/NR/rdonlyres/' as I thought the N3 writer would abbreviate it although this didn't work - Andy is it possible to use prefixes to shorten URIs in N3? 2. lom-tech:location Kevin said in the spec they say to use this only if the resource is somewhere it can't be retrieved by a URL, so he just put the file path to the resource. We can retrieve the resource via the subject URL. The file path is only of use to OCW, so I don't think we need to include it. If we used lom-tech:location to point back to the subject, but that would introduce a loop which is confusing. => Omit lom-tech:location. 3. Canonicalizing names There has been a bit of discussion about canonicalizing names and we decided to try to do what was easy, but leave hard canonicalization up to authority file webservices. So there are two unresolved questions here. - the IMS metadata uses VCard, whereas at the moment the Artstor transform uses a homegrown Person class with four properties: forename, surname, birth and death. Andy, do you think I should switch the Artstor transform to use VCard? - The most common format for names in Artstor is "surname, forename, birth-death" whereas the most common format in IMS is "forename surname" although both collections contain variations. Should we at least try to create a common format e.g. "forename surname" name although this won't work for all instances? One more point about canonicalizing names. Kevin noted that in fact when we try to match duplicates, its harder if we try to guess what different parts of a name field mean e.g. consider Pissarro, Camille, 1830-1903 versus Camille Pissarro, 1830-1903. We can try to use the additional tokens to guess surname and forename order. However the most accurate representation might be to reflect the fact that all we know is that these are identifiers <person:identifiers> <rdf:Bag> <rdf:li>Pissarro</rdf:li> <rdf:li>Camille</rdf:li> <rdf:li>1830</rdf:li> <rdf:li>1903</rdf:li> </rdf:Bag> </person:identifiers> and then try a multiple permutation match to see if two records refer to the same person. Dr Mark H. Butler Research Scientist HP Labs Bristol mark-h_butler@hp.com Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Friday, 24 October 2003 12:09:15 UTC