Re: terminology decomposition and interpretation use case -- Re: Reminder: Telco this Friday, 3-4 pm (CET)

Hi all,

I think the Dampfschiff issue is not limited to a particular use case,
language or research towards standardization.

I don't think John's problematic Dampfschiff examples are valid if you
assume:
- head information to be available and used in the rules: Schiff < Fahrt <
Kapitän.
- the individual multiword components to be form instantiations of lexical
entries so head information can be nested:

MWE                       Head

Dampfschiffsfahrt  Kapitän
Dampfschiffs         Fahrt
Dampf                    Schiff

In general, I agree with John and Tom that for standardization we should
not go into language specific patterns for term variant generation.

However, as a general point I stress the importance of Paul's observation
that in the representation of lexical information for textual identifiers
of ontology concepts "'textual
identifiers in most technical ontologies mostly come in the form of
multi-word or otherwise morphologically complex terms."

The linguistic structure of lexical entries counts for semantic
interpetation. Morphological complexity reflects semantic structure, which
is important for e.g.
semantic ordering into hypernymy structures (a Dampfschiff isa Schiff), and
translation (cf. Tom's semantic network remark).

Head and modifier information seems an obvious first choice for a
as-good-as language-neutral description of compounds.
I do not consider this new linguistic research per se, but an integration
effort of existing standards.
We don't have to invent a new descriptive vocabulary for this, and encode
relevant morphological/(morpho)syntactic information by incorporating for
instance ISOCAT data categories:

head: http://www.isocat.org/datcat/DC-2306
modification type: http://www.isocat.org/datcat/DC-1931
modifier: http://www.isocat.org/datcat/DC-2305

I agree with Paul that incorporating at least basic linguistic information
such as head and modifier is of importance beyond use case level.
>From the use case descriptions I understand this information would actually
be commonly required for a number of use cases (e.g. IE, NLG, TRANS).

How far we want to go down the line of incorporating/linking more standard
elements is up to us

Best wishes,

Wim

On Tue, Nov 6, 2012 at 6:12 PM, Tom Knorr <tknorr@neurocollective.com>wrote:

> I think I am with John in questioning (or maybe defining) the purpose of
> this research.
>
> On the dangers of getting lost in detail:
>
> “Dampfschiff” is a vessel (ship) that utilizes a steam engine for
> propulsion
>
> “Dampfschifffahrt” is either a trip on a steamship in the sense of ‘we had
> a
> nice trip on a steamboat’ (steamer, steamboat, steamship are synonyms
> here).
> Or
> It describes an entire category of steam engine based propulsion ship
> operations (steamship business), including navigation (some of it being
> inherited from nautical navigation, and navigation per se, but also
> particulars of steam engines.)
>
> A “Kapitän” is predominantly a role. Specifically in this example we are
> talking about a captain that operates on water “Schifffahrtskapitän”, more
> precisely a person that operates in that role.
>
> As such the “Kapitän einer Dampfschifffahrt“ is a particular person
> identified by the trip, while the “Dampfschifffahrtskapitän” or “Kapitän
> der
> Dampfschifffahrt“  is a more general term, describing the particular role
> or
> rank in the transportation business using steamships “Dampfschifffahrt”.
>
> Before I bore you too much, I think the idea of using rules to dissect or
> derive ontology concepts is inherently flawed.
>
> The question that John raises about term variation seems to bubble up a
> bigger problem, which is: Can an ontology be translated if you do not
> consider some form of a semantic network behind the individual concepts and
> allow the translation engine to phrase the semantic network (generate a
> descriptive form from the individual sub-concepts) in a language where
> there
> is no direct translation?
>
> Tom
>
> -----Original Message-----
> From: Paul Buitelaar [mailto:paul.buitelaar@deri.org]
> Sent: Tuesday, November 06, 2012 9:09 AM
> To: lupe aguado
> Cc: John McCrae; public-ontolex
> Subject: Re: terminology decomposition and interpretation use case -- Re:
> Reminder: Telco this Friday, 3-4 pm (CET)
>
> Hi Lupe, my German example was only meant to illustrate how lexical
> information for ontology terms can be used (in term variant generation
> in this case). Examples for Spanish will of course work differently, but
> this is not part of the ontology lexicon format. Any
> rules/patterns/classifiers/other are outside of the format but can
> access lexical information for ontology terms (in a standardized format)
> to apply in a variety of use cases - of which term variant generation is
> just one.
>
>
> Paul
>
>
> On 05/11/2012 18:29, lupe aguado wrote:
> > Hi, Paul, all
> >
> > I agree that we have to investigate on the representation of lexical
> > information for textual identifiers of ontology concepts, but the
> > example in German was not applicable to Spanish texts. At least in most
> > cases, we cannot apply the same rules. Maybe I did not understand it
> > well, because we did not have time.
> > Paul, do you mean to extract the conceptual relation from the lexical
> > infromation?
> > This topic also interests us.
> >
> > Lupe
> >
> >
> >
> > 2012/11/5 Paul Buitelaar <paul.buitelaar@deri.org
> > <mailto:paul.buitelaar@deri.org>>
> >
> >     Hi John and all, the objective of the ontolex standardization effort
> >     is not on patterns but on 'the representation of lexical information
> >     for textual identifiers of ontology concepts' where 'textual
> >     identifiers' in most technical ontologies mostly come in the form of
> >     multi-word or otherwise morphologically complex terms. Such lexical
> >     information (for ontology-based terminology) may be used by
> >     patterns, rules, classifiers or other methods which themselves are
> >     not the objective of the ontolex standardization effort. But they
> >     are however an important part of the use case definition for any
> >     ontolex standardization effort.
> >
> >     We will work out the use case more and report next week
> >
> >
> >     Paul
> >
> >
> >
> >     On 02/11/2012 20:34, John McCrae wrote:
> >
> >         I'm guessing what you are looking for are patterns like in this
> >         paper
> >         http://perso.limsi.fr/__jacquemi/FTP/jacmin-ACL99.pdf
> >         <http://perso.limsi.fr/jacquemi/FTP/jacmin-ACL99.pdf> [Table 1]
> >
> >         I have two main criticisms about this: firstly, it seems that
> these
> >         patterns are few in number (per language) and not tied to
> particular
> >         lexical entries, but rather are syntactic rules
> >
> >         Secondly, these rules are very unreliable... let's take your
> example
> >
> >         The rule from your example is approximately N1N2"s"N3 => N1N3 für
> N2
> >
> >         Firstly this could easily lead to incorrect inference...
> >         consider for
> >         example
> >
> >         Dampfschifffahrtskapitän (Steam ship [trip] captain)
> >
> >         The rule would lead to
> >
> >         *Dampfkapitän für Schifffahrt (Steam captain for ship trips)
> >
> >         Or worse
> >
> >         *Dampfschiffkapitän für Fahrt (Steam ship captain for trips)
> >
> >         Furthermore, I don't believe that the reason for choosing this
> >         pattern
> >         to apply has directly to do with inherent properties of the
> >         entry, for
> >         example
> >
> >         Archivierungsbundesgesetz = Bundesgesetz über die Archivierung
> >            (Archiving federal law = Federal law about archiving)
> >
> >         So it seems that Bundesgesetz can at least be used with either
> >         für or über
> >
> >         These leads me to another key problem... what are we (as OntoLex)
> >         standardizing? I am not aware of any existing formats for
> >         representing
> >         term variation patterns (unlike say lexico-semantic patterns or
> >         inflection patterns), therefore it is possible that this could be
> >         original research (albeit very interesting research) and hence
> not
> >         within the remit of this group. Paul, perhaps you can assuage
> these
> >         fears with some more concrete examples?
> >
> >         Regards,
> >         John
> >
> >         On Fri, Nov 2, 2012 at 5:55 PM, Paul Buitelaar
> >         <paul.buitelaar@deri.org <mailto:paul.buitelaar@deri.org>
> >         <mailto:paul.buitelaar@deri.__org
> >         <mailto:paul.buitelaar@deri.org>>> wrote:
> >
> >              All, to finish this discussion online
> >
> >              We would like to emphasize the use case for an
> ontology-lexicon
> >              model in ontology-driven decomposition and interpretation of
> >              terminology.
> >
> >              This is already possible in the lemon model (and has been a
> >         focus of
> >              pre-decessor models LingInfo & LexInfo) but we would still
> >         need to
> >              make a more extensive use case for it in the context of
> >         this WG so
> >              that interested parties, incl. commercial can better
> >         interpret the
> >              potential use of lemon (or follow-up model) in their
> >         application
> >              context.
> >
> >              As explained briefly in the telco today, the following
> German
> >              example illustrates this:
> >
> >              '____Bundesausbildungsfoerderungsge____setz'
> (terminologically:
> >
> >              single-word technical term; linguistically: complex noun
> >         compound)
> >              from the STW Thesaurus for Economics
> >              (http://zbw.eu/stw/versions/____latest/about
> >         <http://zbw.eu/stw/versions/__latest/about>
> >              <http://zbw.eu/stw/versions/__latest/about
> >         <http://zbw.eu/stw/versions/latest/about>>)
> >
> >
> >              Given the STW context (= STW terms), this term/compound can
> be
> >              decomposed (and represented in lemon) as follows:
> >
> >              stw:bundes, stw:ausbildungsfoerderungs, stw:gesetz
> >              (federal, education support, law)
> >
> >              with head stw:gesetz, i.e.
> >
> >              [mod stw:bundes [mod stw:ausbildungsfoerderungs]] [head
> >         stw:gesetz]
> >
> >              where each component directs back to an STW concept
> >
> >              With this representation (abbreviated and more elaborate in
> >         lemon) a
> >              process can derive term variants for this same concept, such
> as
> >
> >              Bundesgesetz fuer Ausbildungsfoerderung
> >              (federal law on education support)
> >
> >
> >              As said in the telco, at DERI we are happy to collaborate
> with
> >              others on working this out in more detail and connect it
> >         with other
> >              relevant use cases
> >
> >              Cheers
> >
> >
> >              Paul
> >
> >
> >              On 02/11/2012 13:14, Paul Buitelaar wrote:
> >
> >                  Philipp, all, from DERI side we would be interested to
> >         develop a use
> >                  case in term analysis / decomposition - some examples
> from
> >                  German below.
> >                  We think this would focus discussion more on the
> >                  lexical/terminological
> >                  side of the lemon requirements.
> >
> >                  More explanation of the examples in the telco
> >
> >
> >                  Paul/Tobias
> >
> >                  ---------------------
> >
> >                  example 1
> >                  NO_ENGLISH / Bundesausbildungsfoerderungsge____setz
> >                  Bundesausbildungsfoerderungsge____setz -> [stw:bundes,
> >
> >                  stw:ausbildungsfoerderungs, stw:gesetz]
> >
> >                  example 2
> >                  Chancengleichheit in der Bildung / Equal opportunities
> >         in education
> >                  decomposition: [stw:chancen, stw:gleichheit]
> >
> >                  Bildungsungleichheit / Inequality of opportunity in
> >         education
> >                  decomposition: [stw:bildungs, stw:ungleichheit]
> >
> >                  example 3
> >                  NO_ENGLISH / Fuer die Arbeitsplatzsuche
> >                  decomposition: [stw:arbeitsplatz, igerman:suche]
> >
> >                  example 4
> >                  NO_ENGLISH / Bilanzierung von
> Fremdwährungstransaktionen
> >                  Fremdwaehrungstransaktionen -> [stw:fremd,
> stw:waehrungs,
> >                  stw:transaktionen]
> >                  Fremdwaehrungstransaktionen -> [igerman:fremdwaehrungs,
> >                  stw:transaktionen]
> >
> >
> >
> >                  On 30/10/2012 09:42, Philipp Cimiano wrote:
> >
> >                      Dear all,
> >
> >                         this is a gentle reminder for our telco on
> >         Friday, 3-4 pm
> >                      (CET).
> >
> >                      The access details and agenda are available here:
> >
> http://www.w3.org/community/____ontolex/wiki/Teleconference,_____2012.11.02
> ,
> _3-4_pm_CET
> >
> <
> http://www.w3.org/community/__ontolex/wiki/Teleconference,___2012.11.02,_3-
> 4_pm_CET>
> >
> >
> >
> <
> http://www.w3.org/community/__ontolex/wiki/Teleconference,___2012.11.02,_3-
> 4_pm_CET
> >
> <
> http://www.w3.org/community/ontolex/wiki/Teleconference,_2012.11.02,_3-4_pm
> _CET>>
> >
> >
> >                      I will prepare a document summarizing our discussion
> on
> >                      senses for the
> >                      meeting.
> >
> >                      The agenda says the following:
> >
> >                      # Discussion on naming of Path from Lexical Entry
> >         over Sense to
> >                      OntologyEntity (20 min.) -> Philipp to prepare
> >                      # Discussion of Req. 4 (Higher-Order Mappings ->
> >         John to
> >                      prepare a draft)
> >                      # Discussion on Req. 5 (Lexico-Syntactic Patterns
> >         -> Dagmar
> >                      to prepare a
> >                      draft)
> >                      # Discussion on Req. 6 (Metadata -> Armando to
> >         provide a draft)
> >
> >                      Can I remind John, Dagmar and Armando to prepare
> some
> >                      material (in the
> >                      wiki) and present the material for a few minutes so
> >         that we
> >                      can have a
> >                      first discussion on the issues?
> >
> >                      Thanks and talk to you all on Friday.
> >
> >                      Philipp.
> >
> >                      --
> >                      Prof. Dr. Philipp Cimiano
> >                      Semantic Computing Group
> >                      Excellence Cluster - Cognitive Interaction
> >         Technology (CITEC)
> >                      University of Bielefeld
> >
> >                      Phone: +49 521 106 12249
> >         <tel:%2B49%20521%20106%2012249> <tel:%2B49%20521%20106%__2012249>
> >                      Fax: +49 521 106 12412
> >         <tel:%2B49%20521%20106%2012412> <tel:%2B49%20521%20106%__2012412>
> >         Mail:cimiano@cit-ec.uni-____bielefeld.de
> >         <mailto:Mail%3Acimiano@cit-ec.uni-__bielefeld.de>
> >                      <mailto:Mail%3Acimiano@cit-ec.__uni-bielefeld.de
> >         <mailto:Mail%253Acimiano@cit-ec.uni-bielefeld.de>>
> >
> >
> >                      Room H-127
> >                      Morgenbreede 39
> >                      33615 Bielefeld
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
>


-- 
Dr. W. Peters
Research Fellow
Natural Language Processing group
Department of Computer Science
University of Sheffield
Regent Court
211 Portobello Street
Sheffield S1 4DP
tel: 00-44-114-2221902
fax: 00-44-114-2221810
email: w.peters@dcs.shef.ac.uk

Received on Wednesday, 7 November 2012 12:10:40 UTC