- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Mon, 27 Oct 2003 07:35:34 -0800
- To: "Butler, Mark" <Mark_Butler@hplb.hpl.hp.com>
- Cc: SIMILE public list <www-rdf-dspace@w3.org>
Butler, Mark wrote:
>However I've been looking at the OCW data today and I do think it would be a
>good idea to put the contents of the <Keyword> elements into the OCW RDF.
>While it is true that <taxonpath><source> only has two values - LCSH and CIP
>- there is variation in keyword as shown in the enclosed file.
>
>
Agreed, and that is why Keyword is included in the transformation.
According to the IMS RDF specification, LOM-Keyword should be translated
into dc:subject, which is what I have done.
>Dr Mark H. Butler
>Research Scientist HP Labs Bristol
>mark-h_butler@hp.com
>Internet: http://www-uk.hpl.hp.com/people/marbut/
>
>
>
>>-----Original Message-----
>>From: Kevin Smathers [mailto:kevin.smathers@hp.com]
>>Sent: 24 October 2003 18:12
>>To: Butler, Mark
>>Cc: SIMILE public list
>>Subject: Canonicalizing names (was Re: XSLT script for IMS)
>>
>>
>>I've been investigating the name formats used in the OCW xml files.
>>I've attached a complete listing of the names as found using the
>>following xmlstarlet command:
>>
>>$ xml sel -T -t -m //Entity -v . -n *.xml | sort | uniq >namelist.txt
>>
>>There are several names here that I would expect to cause trouble:
>>
>>Gleason's Pictorial
>>Brown
>>United States of America
>>Smithsonian Institution
>>Glenn Ellison; Sara Ellison
>>Getty Images
>>Peters, W. T.
>>Prof. Joseph Ferriera, Thomas Grayson
>>
>>The main two formats are "[honorific] firstname lastname[,
>>appelation]",
>>and "lastname, firstname [middlename or initial]", but these make up
>>fewer than half of the records as a whole.
>>
>>The OCLC web service does a pretty good job of finding matches in the
>>"lastname, firstname [middlename or initial]" case, but only attempts
>>word-matches in the "firstname lastname" case and fails completely if
>>the honorific is left attached. To do this yourself try for example
>>searching for "Tom Leighton" (see MacKenzie's e-mail for the value of
>>oclcservice):
>>
>>$ wget
>>"http://$oclcservice?method=getCompleteSelectedNameAuthority&n
>>ame=Tom+Leighton&maxList=10&serviceType=rest&isPersonalName=tr
>>ue" -O leighton.tmp
>>$ xml fo leighton.tmp >leighton.xml
>>
>>
>>The results are in the second attachment. As you can see, 'Tom
>>Leighton' was matched against 'Wendt, Thomas Leighton' using
>>word-match,
>>whereas 'Leighton, Tom' would return a superior phrase-match.
>>
>>The degerate cases shown above don't yield any useful results
>>from the
>>OCLC web service.
>>
>>Cheers,
>>-kls
>>
>>
>>
>
>
>
>
>------------------------------------------------------------------------
>
> <Keyword>AM and FM modulation</Keyword>
> <Keyword>Age of Reason</Keyword>
> <Keyword>Algebra, Universal</Keyword>
> <Keyword>Almereyda</Keyword>
> <Keyword>Ampere's law</Keyword>
> <Keyword>Basic electric circuits</Keyword>
> <Keyword>C++</Keyword>
> <Keyword>C</Keyword>
> <Keyword>Calculus of operations </Keyword>
> <Keyword>Concepts of electrostatic field and potential, electrostatic energy</Keyword>
> <Keyword>Congress</Keyword>
> <Keyword>Congressional behavior</Keyword>
> <Keyword>Coulomb's law</Keyword>
> <Keyword>DNA replication</Keyword>
> <Keyword>Doppler effect</Keyword>
> <Keyword>E-commerce</Keyword>
> <Keyword>Economics</Keyword>
> <Keyword>Electric currents</Keyword>
> <Keyword>Electromagnetic waves</Keyword>
> <Keyword>Faraday's law of induction</Keyword>
> <Keyword>Fourier transforms</Keyword>
> <Keyword>Fresnel and Faunhofer diffraction </Keyword>
> <Keyword>GIS</Keyword>
> <Keyword>Generalized spaces</Keyword>
> <Keyword>Industrial engineering</Keyword>
> <Keyword>Introduction to electromagnetism and electrostatics</Keyword>
> <Keyword>Java</Keyword>
> <Keyword>Julie Taymor</Keyword>
> <Keyword>Kenneth Branagh</Keyword>
> <Keyword>Kurosawa</Keyword>
> <Keyword>Laurence Olivier</Keyword>
> <Keyword>Line geometry</Keyword>
> <Keyword>Linear algebra</Keyword>
> <Keyword>MIT</Keyword>
> <Keyword>Macroeconomics</Keyword>
> <Keyword>Magnetic materials</Keyword>
> <Keyword>Management science</Keyword>
> <Keyword>Markov processes</Keyword>
> <Keyword>Mathematical analysis </Keyword>
> <Keyword>Maxwell's equations</Keyword>
> <Keyword>Mechanical translation</Keyword>
> <Keyword>Orson Welles</Keyword>
> <Keyword>Polanski</Keyword>
> <Keyword>Richard Loncraine</Keyword>
> <Keyword>Scheme+</Keyword>
> <Keyword>Scheme</Keyword>
> <Keyword>Shakespeare</Keyword>
> <Keyword>Speech</Keyword>
> <Keyword>Systems engineering</Keyword>
> <Keyword>TV</Keyword>
> <Keyword>Time-varying fields</Keyword>
> <Keyword>Topology</Keyword>
> <Keyword>Wave optics</Keyword>
> <Keyword>Zeffirelli</Keyword>
> <Keyword>abstract types</Keyword>
> <Keyword>advertising</Keyword>
> <Keyword>air transportation systems</Keyword>
> <Keyword>air-water exchange</Keyword>
> <Keyword>apertures and stops</Keyword>
> <Keyword>auctions</Keyword>
> <Keyword>aurora borealis</Keyword>
> <Keyword>bed-water exchange</Keyword>
> <Keyword>binary stars</Keyword>
> <Keyword>black holes</Keyword>
> <Keyword>blue skies</Keyword>
> <Keyword>boundary layers</Keyword>
> <Keyword>bullet trains</Keyword>
> <Keyword>buoyancy-driven flows</Keyword>
> <Keyword>car coils</Keyword>
> <Keyword>catalytic proteins</Keyword>
> <Keyword>color perception</Keyword>
> <Keyword>competition</Keyword>
> <Keyword>computer graphics</Keyword>
> <Keyword>computer</Keyword>
> <Keyword>conductors</Keyword>
> <Keyword>cultural history</Keyword>
> <Keyword>customer orientation</Keyword>
> <Keyword>data abstraction</Keyword>
> <Keyword>data structures</Keyword>
> <Keyword>denotational semantics</Keyword>
> <Keyword>dielectrics</Keyword>
> <Keyword>differential equations</Keyword>
> <Keyword>digital circuits</Keyword>
> <Keyword>diode circuits</Keyword>
> <Keyword>dissolution</Keyword>
> <Keyword>distribution policy</Keyword>
> <Keyword>dynamic programming</Keyword>
> <Keyword>econometrics</Keyword>
> <Keyword>educational technology</Keyword>
> <Keyword>eigen values</Keyword>
> <Keyword>electric charge</Keyword>
> <Keyword>electric motors</Keyword>
> <Keyword>electric shock treatment</Keyword>
> <Keyword>electric structure of matter</Keyword>
> <Keyword>electrical circuits</Keyword>
> <Keyword>electro-mechanical devices</Keyword>
> <Keyword>electrocardiograms</Keyword>
> <Keyword>electrodynamics</Keyword>
> <Keyword>empirical economics</Keyword>
> <Keyword>engineering</Keyword>
> <Keyword>finance</Keyword>
> <Keyword>functional programming language</Keyword>
> <Keyword>gene regulation</Keyword>
> <Keyword>genetic recombination</Keyword>
> <Keyword>graphical user interface</Keyword>
> <Keyword>haloes around sun and moon</Keyword>
> <Keyword>heuristics</Keyword>
> <Keyword>highway systems</Keyword>
> <Keyword>image formation </Keyword>
> <Keyword>imperative programming language</Keyword>
> <Keyword>inference</Keyword>
> <Keyword>integer programming</Keyword>
> <Keyword>intellectual history</Keyword>
> <Keyword>interferometers</Keyword>
> <Keyword>lake systems</Keyword>
> <Keyword>language</Keyword>
> <Keyword>large systems</Keyword>
> <Keyword>lens design</Keyword>
> <Keyword>lightning</Keyword>
> <Keyword>linear differential equations </Keyword>
> <Keyword>linear programming</Keyword>
> <Keyword>linguistics</Keyword>
> <Keyword>logistics</Keyword>
> <Keyword>magnetic fields</Keyword>
> <Keyword>magnetic levitation</Keyword>
> <Keyword>marketing</Keyword>
> <Keyword>mass spectrometers</Keyword>
> <Keyword>mathematical economics</Keyword>
> <Keyword>matrix theory </Keyword>
> <Keyword>media design</Keyword>
> <Keyword>meta-circular interpreters</Keyword>
> <Keyword>metal detectors</Keyword>
> <Keyword>modularity</Keyword>
> <Keyword>modules</Keyword>
> <Keyword>molecular diffusion</Keyword>
> <Keyword>molecules</Keyword>
> <Keyword>momentum transport in environmental flows</Keyword>
> <Keyword>monopoly</Keyword>
> <Keyword>moon</Keyword>
> <Keyword>multiprocessing</Keyword>
> <Keyword>musical instruments</Keyword>
> <Keyword>network optimization</Keyword>
> <Keyword>neutron stars</Keyword>
> <Keyword>non-linear programming</Keyword>
> <Keyword>numerical methods</Keyword>
> <Keyword>object modeling</Keyword>
> <Keyword>object oriented programming</Keyword>
> <Keyword>object oriented</Keyword>
> <Keyword>ocean transportation systems</Keyword>
> <Keyword>ogic programming languages</Keyword>
> <Keyword>oligopoly</Keyword>
> <Keyword>op-amps</Keyword>
> <Keyword>operational semantics</Keyword>
> <Keyword>pacemakers</Keyword>
> <Keyword>particle accelerators (a.k.a. atom smashers or colliders)</Keyword>
> <Keyword>phase partitioning</Keyword>
> <Keyword>philosophy</Keyword>
> <Keyword>photometry</Keyword>
> <Keyword>planets</Keyword>
> <Keyword>polarization</Keyword>
> <Keyword>political process</Keyword>
> <Keyword>polymorphism</Keyword>
> <Keyword>positive definite matrices</Keyword>
> <Keyword>price discrimination</Keyword>
> <Keyword>pricing</Keyword>
> <Keyword>problem solving</Keyword>
> <Keyword>product strategy</Keyword>
> <Keyword>programming language</Keyword>
> <Keyword>programming</Keyword>
> <Keyword>project management</Keyword>
> <Keyword>protein binding</Keyword>
> <Keyword>public management</Keyword>
> <Keyword>public opinion surveys</Keyword>
> <Keyword>public policy</Keyword>
> <Keyword>radio telescopes</Keyword>
> <Keyword>radiometry</Keyword>
> <Keyword>radios</Keyword>
> <Keyword>rainbows</Keyword>
> <Keyword>ray-tracing</Keyword>
> <Keyword>red sunsets</Keyword>
> <Keyword>resolution </Keyword>
> <Keyword>river systems</Keyword>
> <Keyword>scalar transport in environmental flows</Keyword>
> <Keyword>searching</Keyword>
> <Keyword>settling and coagulation</Keyword>
> <Keyword>software design</Keyword>
> <Keyword>software development</Keyword>
> <Keyword>software testing</Keyword>
> <Keyword>software</Keyword>
> <Keyword>sorting</Keyword>
> <Keyword>space-bandwidth product </Keyword>
> <Keyword>spatial analysis </Keyword>
> <Keyword>specification</Keyword>
> <Keyword>spectral analysis</Keyword>
> <Keyword>spectroscopy</Keyword>
> <Keyword>speech disorders</Keyword>
> <Keyword>speech prosody</Keyword>
> <Keyword>speech recognition</Keyword>
> <Keyword>stars</Keyword>
> <Keyword>statistics</Keyword>
> <Keyword>stratification in lakes</Keyword>
> <Keyword>super-novae</Keyword>
> <Keyword>superconductivity</Keyword>
> <Keyword>systems of equations</Keyword>
> <Keyword>telescopes</Keyword>
> <Keyword>transients</Keyword>
> <Keyword>transistor circuits</Keyword>
> <Keyword>turbulent diffusion</Keyword>
> <Keyword>type systems</Keyword>
> <Keyword>uniaxial rotation</Keyword>
> <Keyword>vector spaces</Keyword>
> <Keyword>voting behavior</Keyword>
> <Keyword>water transportation</Keyword>
> <Keyword>wave-guiding </Keyword>
> <Keyword>waveform analysis</Keyword
>
--
========================================================
Kevin Smathers kevin.smathers@hp.com
Hewlett-Packard kevin@ank.com
Palo Alto Research Lab
1501 Page Mill Rd. 650-857-4477 work
M/S 1135 650-852-8186 fax
Palo Alto, CA 94304 510-247-1031 home
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");
Received on Monday, 27 October 2003 10:43:38 UTC