- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Mon, 27 Oct 2003 07:35:34 -0800
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: SIMILE public list <www-rdf-dspace@w3.org>
Butler, Mark wrote: >However I've been looking at the OCW data today and I do think it would be a >good idea to put the contents of the <Keyword> elements into the OCW RDF. >While it is true that <taxonpath><source> only has two values - LCSH and CIP >- there is variation in keyword as shown in the enclosed file. > > Agreed, and that is why Keyword is included in the transformation. According to the IMS RDF specification, LOM-Keyword should be translated into dc:subject, which is what I have done. >Dr Mark H. Butler >Research Scientist HP Labs Bristol >mark-h_butler@hp.com >Internet: http://www-uk.hpl.hp.com/people/marbut/ > > > >>-----Original Message----- >>From: Kevin Smathers [mailto:kevin.smathers@hp.com] >>Sent: 24 October 2003 18:12 >>To: Butler, Mark >>Cc: SIMILE public list >>Subject: Canonicalizing names (was Re: XSLT script for IMS) >> >> >>I've been investigating the name formats used in the OCW xml files. >>I've attached a complete listing of the names as found using the >>following xmlstarlet command: >> >>$ xml sel -T -t -m //Entity -v . -n *.xml | sort | uniq >namelist.txt >> >>There are several names here that I would expect to cause trouble: >> >>Gleason's Pictorial >>Brown >>United States of America >>Smithsonian Institution >>Glenn Ellison; Sara Ellison >>Getty Images >>Peters, W. T. >>Prof. Joseph Ferriera, Thomas Grayson >> >>The main two formats are "[honorific] firstname lastname[, >>appelation]", >>and "lastname, firstname [middlename or initial]", but these make up >>fewer than half of the records as a whole. >> >>The OCLC web service does a pretty good job of finding matches in the >>"lastname, firstname [middlename or initial]" case, but only attempts >>word-matches in the "firstname lastname" case and fails completely if >>the honorific is left attached. To do this yourself try for example >>searching for "Tom Leighton" (see MacKenzie's e-mail for the value of >>oclcservice): >> >>$ wget >>"http://$oclcservice?method=getCompleteSelectedNameAuthority&n >>ame=Tom+Leighton&maxList=10&serviceType=rest&isPersonalName=tr >>ue" -O leighton.tmp >>$ xml fo leighton.tmp >leighton.xml >> >> >>The results are in the second attachment. As you can see, 'Tom >>Leighton' was matched against 'Wendt, Thomas Leighton' using >>word-match, >>whereas 'Leighton, Tom' would return a superior phrase-match. >> >>The degerate cases shown above don't yield any useful results >>from the >>OCLC web service. >> >>Cheers, >>-kls >> >> >> > > > > >------------------------------------------------------------------------ > > <Keyword>AM and FM modulation</Keyword> > <Keyword>Age of Reason</Keyword> > <Keyword>Algebra, Universal</Keyword> > <Keyword>Almereyda</Keyword> > <Keyword>Ampere's law</Keyword> > <Keyword>Basic electric circuits</Keyword> > <Keyword>C++</Keyword> > <Keyword>C</Keyword> > <Keyword>Calculus of operations </Keyword> > <Keyword>Concepts of electrostatic field and potential, electrostatic energy</Keyword> > <Keyword>Congress</Keyword> > <Keyword>Congressional behavior</Keyword> > <Keyword>Coulomb's law</Keyword> > <Keyword>DNA replication</Keyword> > <Keyword>Doppler effect</Keyword> > <Keyword>E-commerce</Keyword> > <Keyword>Economics</Keyword> > <Keyword>Electric currents</Keyword> > <Keyword>Electromagnetic waves</Keyword> > <Keyword>Faraday's law of induction</Keyword> > <Keyword>Fourier transforms</Keyword> > <Keyword>Fresnel and Faunhofer diffraction </Keyword> > <Keyword>GIS</Keyword> > <Keyword>Generalized spaces</Keyword> > <Keyword>Industrial engineering</Keyword> > <Keyword>Introduction to electromagnetism and electrostatics</Keyword> > <Keyword>Java</Keyword> > <Keyword>Julie Taymor</Keyword> > <Keyword>Kenneth Branagh</Keyword> > <Keyword>Kurosawa</Keyword> > <Keyword>Laurence Olivier</Keyword> > <Keyword>Line geometry</Keyword> > <Keyword>Linear algebra</Keyword> > <Keyword>MIT</Keyword> > <Keyword>Macroeconomics</Keyword> > <Keyword>Magnetic materials</Keyword> > <Keyword>Management science</Keyword> > <Keyword>Markov processes</Keyword> > <Keyword>Mathematical analysis </Keyword> > <Keyword>Maxwell's equations</Keyword> > <Keyword>Mechanical translation</Keyword> > <Keyword>Orson Welles</Keyword> > <Keyword>Polanski</Keyword> > <Keyword>Richard Loncraine</Keyword> > <Keyword>Scheme+</Keyword> > <Keyword>Scheme</Keyword> > <Keyword>Shakespeare</Keyword> > <Keyword>Speech</Keyword> > <Keyword>Systems engineering</Keyword> > <Keyword>TV</Keyword> > <Keyword>Time-varying fields</Keyword> > <Keyword>Topology</Keyword> > <Keyword>Wave optics</Keyword> > <Keyword>Zeffirelli</Keyword> > <Keyword>abstract types</Keyword> > <Keyword>advertising</Keyword> > <Keyword>air transportation systems</Keyword> > <Keyword>air-water exchange</Keyword> > <Keyword>apertures and stops</Keyword> > <Keyword>auctions</Keyword> > <Keyword>aurora borealis</Keyword> > <Keyword>bed-water exchange</Keyword> > <Keyword>binary stars</Keyword> > <Keyword>black holes</Keyword> > <Keyword>blue skies</Keyword> > <Keyword>boundary layers</Keyword> > <Keyword>bullet trains</Keyword> > <Keyword>buoyancy-driven flows</Keyword> > <Keyword>car coils</Keyword> > <Keyword>catalytic proteins</Keyword> > <Keyword>color perception</Keyword> > <Keyword>competition</Keyword> > <Keyword>computer graphics</Keyword> > <Keyword>computer</Keyword> > <Keyword>conductors</Keyword> > <Keyword>cultural history</Keyword> > <Keyword>customer orientation</Keyword> > <Keyword>data abstraction</Keyword> > <Keyword>data structures</Keyword> > <Keyword>denotational semantics</Keyword> > <Keyword>dielectrics</Keyword> > <Keyword>differential equations</Keyword> > <Keyword>digital circuits</Keyword> > <Keyword>diode circuits</Keyword> > <Keyword>dissolution</Keyword> > <Keyword>distribution policy</Keyword> > <Keyword>dynamic programming</Keyword> > <Keyword>econometrics</Keyword> > <Keyword>educational technology</Keyword> > <Keyword>eigen values</Keyword> > <Keyword>electric charge</Keyword> > <Keyword>electric motors</Keyword> > <Keyword>electric shock treatment</Keyword> > <Keyword>electric structure of matter</Keyword> > <Keyword>electrical circuits</Keyword> > <Keyword>electro-mechanical devices</Keyword> > <Keyword>electrocardiograms</Keyword> > <Keyword>electrodynamics</Keyword> > <Keyword>empirical economics</Keyword> > <Keyword>engineering</Keyword> > <Keyword>finance</Keyword> > <Keyword>functional programming language</Keyword> > <Keyword>gene regulation</Keyword> > <Keyword>genetic recombination</Keyword> > <Keyword>graphical user interface</Keyword> > <Keyword>haloes around sun and moon</Keyword> > <Keyword>heuristics</Keyword> > <Keyword>highway systems</Keyword> > <Keyword>image formation </Keyword> > <Keyword>imperative programming language</Keyword> > <Keyword>inference</Keyword> > <Keyword>integer programming</Keyword> > <Keyword>intellectual history</Keyword> > <Keyword>interferometers</Keyword> > <Keyword>lake systems</Keyword> > <Keyword>language</Keyword> > <Keyword>large systems</Keyword> > <Keyword>lens design</Keyword> > <Keyword>lightning</Keyword> > <Keyword>linear differential equations </Keyword> > <Keyword>linear programming</Keyword> > <Keyword>linguistics</Keyword> > <Keyword>logistics</Keyword> > <Keyword>magnetic fields</Keyword> > <Keyword>magnetic levitation</Keyword> > <Keyword>marketing</Keyword> > <Keyword>mass spectrometers</Keyword> > <Keyword>mathematical economics</Keyword> > <Keyword>matrix theory </Keyword> > <Keyword>media design</Keyword> > <Keyword>meta-circular interpreters</Keyword> > <Keyword>metal detectors</Keyword> > <Keyword>modularity</Keyword> > <Keyword>modules</Keyword> > <Keyword>molecular diffusion</Keyword> > <Keyword>molecules</Keyword> > <Keyword>momentum transport in environmental flows</Keyword> > <Keyword>monopoly</Keyword> > <Keyword>moon</Keyword> > <Keyword>multiprocessing</Keyword> > <Keyword>musical instruments</Keyword> > <Keyword>network optimization</Keyword> > <Keyword>neutron stars</Keyword> > <Keyword>non-linear programming</Keyword> > <Keyword>numerical methods</Keyword> > <Keyword>object modeling</Keyword> > <Keyword>object oriented programming</Keyword> > <Keyword>object oriented</Keyword> > <Keyword>ocean transportation systems</Keyword> > <Keyword>ogic programming languages</Keyword> > <Keyword>oligopoly</Keyword> > <Keyword>op-amps</Keyword> > <Keyword>operational semantics</Keyword> > <Keyword>pacemakers</Keyword> > <Keyword>particle accelerators (a.k.a. atom smashers or colliders)</Keyword> > <Keyword>phase partitioning</Keyword> > <Keyword>philosophy</Keyword> > <Keyword>photometry</Keyword> > <Keyword>planets</Keyword> > <Keyword>polarization</Keyword> > <Keyword>political process</Keyword> > <Keyword>polymorphism</Keyword> > <Keyword>positive definite matrices</Keyword> > <Keyword>price discrimination</Keyword> > <Keyword>pricing</Keyword> > <Keyword>problem solving</Keyword> > <Keyword>product strategy</Keyword> > <Keyword>programming language</Keyword> > <Keyword>programming</Keyword> > <Keyword>project management</Keyword> > <Keyword>protein binding</Keyword> > <Keyword>public management</Keyword> > <Keyword>public opinion surveys</Keyword> > <Keyword>public policy</Keyword> > <Keyword>radio telescopes</Keyword> > <Keyword>radiometry</Keyword> > <Keyword>radios</Keyword> > <Keyword>rainbows</Keyword> > <Keyword>ray-tracing</Keyword> > <Keyword>red sunsets</Keyword> > <Keyword>resolution </Keyword> > <Keyword>river systems</Keyword> > <Keyword>scalar transport in environmental flows</Keyword> > <Keyword>searching</Keyword> > <Keyword>settling and coagulation</Keyword> > <Keyword>software design</Keyword> > <Keyword>software development</Keyword> > <Keyword>software testing</Keyword> > <Keyword>software</Keyword> > <Keyword>sorting</Keyword> > <Keyword>space-bandwidth product </Keyword> > <Keyword>spatial analysis </Keyword> > <Keyword>specification</Keyword> > <Keyword>spectral analysis</Keyword> > <Keyword>spectroscopy</Keyword> > <Keyword>speech disorders</Keyword> > <Keyword>speech prosody</Keyword> > <Keyword>speech recognition</Keyword> > <Keyword>stars</Keyword> > <Keyword>statistics</Keyword> > <Keyword>stratification in lakes</Keyword> > <Keyword>super-novae</Keyword> > <Keyword>superconductivity</Keyword> > <Keyword>systems of equations</Keyword> > <Keyword>telescopes</Keyword> > <Keyword>transients</Keyword> > <Keyword>transistor circuits</Keyword> > <Keyword>turbulent diffusion</Keyword> > <Keyword>type systems</Keyword> > <Keyword>uniaxial rotation</Keyword> > <Keyword>vector spaces</Keyword> > <Keyword>voting behavior</Keyword> > <Keyword>water transportation</Keyword> > <Keyword>wave-guiding </Keyword> > <Keyword>waveform analysis</Keyword > -- ======================================================== Kevin Smathers kevin.smathers@hp.com Hewlett-Packard kevin@ank.com Palo Alto Research Lab 1501 Page Mill Rd. 650-857-4477 work M/S 1135 650-852-8186 fax Palo Alto, CA 94304 510-247-1031 home ======================================================== use "Standard::Disclaimer"; carp("This message was printed on 100% recycled bits.");
Received on Monday, 27 October 2003 10:43:38 UTC