W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > June 2012

Re: [ACTION-94]: go and find examples of concept ontology (semantic features of terms as opposed to domain type ontologies)

From: Tadej Štajner <tadej.stajner@ijs.si>
Date: Thu, 07 Jun 2012 14:42:51 +0200
Message-ID: <4FD0A1CB.2010205@ijs.si>
To: public-multilingualweb-lt@w3.org
Hi,

I agree with Pedro on the questions. Automatic word sense disambiguation 
is in practice still not perfect, so some semi-automatic user interfaces 
make a lot of sense. And how I think that this could look like in a 
made-up example, answering Felix's 1) and 2):

1) HTML+ITS: <span its-disambiguation 
its-semantic-network-ref="http://www.sfs.uni-tuebingen.de/lsd/index.shtml" 
its-selector="#synset_loschen_3">löschen</span>

2) Markup in raw ITS
<its:disambiguation
     semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"
     selector="#synset_loschen_3">löschen</its:disambiguation>

-- Tadej


On 04. 06. 2012 13:53, Pedro L. Díez Orzas wrote:
>
> Dear Felix,
>
> Thank you very much. Probably Tadej can prepare the use cases you 
> mention, with the consolidated data category. About the question 3 and 
> 4, I can tell you the following:
>
> 3) Would it be produced also by an automatic text annotation tool?
>
> For the pointers to the three information referred (concepts in 
> Ontology, meanings in Lexical DB, and terms in Terminological 
> resources) I think it would be possible semiautomatic annotation 
> tools, that is, proposed by the tool and confirmed by user.
>
> The fully automatic text annotation  would need more sophisticate 
> “semantic calculus”, and most of these are under research, as far as I 
> know. Maybe, in this cases, it should be combined with 
> textAnalysisAnnotation, specifying in *Annotation agent* – and 
> *Confidence score* – which systemand with which reliability has been 
> produced.
>
> 4) Would 1-2 be consumed by an MT tool, or by other tools?
>
> These can be basically consumed by language processing tools, like MT, 
> and other Linguistic Technology that needs content or semantic info. 
> For instance Text Analytics, Semantic search, etc.. In the 
> localization chains, these information can be also used by automatic 
> or semiautomatic processes (like selection of dictionaries for 
> translations, or selection of translators/revisers by subject area)
>
> It could be also used by humans for translation or post-edition in 
> case of ambiguity or lake of context in the content, but mostly by 
> automatic systems.
>
> I hope this helps.
>
> Pedro
>
> ------------------------------------------------------------------------
>
> *De:*Felix Sasaki [mailto:fsasaki@w3.org]
> *Enviado el:* sábado, 02 de junio de 2012 14:13
> *Para:* Tadej Stajner; pedro.diez
> *CC:* public-multilingualweb-lt@w3.org
> *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology 
> (semantic features of terms as opposed to domain type ontologies)
>
> Hi Tadej, Pedro, all,
>
> this looks like a great chain of producing and consuming metadata.
>
> Apologies if this was explained during last weeks call or before, but 
> can you clarify a bit the following:
>
> 1) How would the actual HTML markup produced in the original text 
> annotation use case look like?
>
> 2) How would the markup in this use case look like?
>
> 3) Would it be produced also by an automatic text annotation tool?
>
> 4) Would 1-2 be consumed by an MT tool, or by other tools?
>
> Thanks again,
>
> Felix
>
> 2012/5/31 Tadej Stajner <tadej.stajner@ijs.si 
> <mailto:tadej.stajner@ijs.si>>
>
> Hi Pedro,
> thanks for the excellent explanation. If I understand you correctly, a 
> sufficient example for this use case would be annotation of individual 
> words with synset URI of the appropriate wordnet? If so, then I 
> believe this route can be practical - I think linking to the synset is 
> a more practical idea than expressing semantic features of the word 
> given the available tools.
>
> Enrycher can do automatic all-word disambiguation into the english 
> wordnet, whereas  we don't have anything specific in place for 
> semantic features (which I suspect also holds for other text analytics 
> providers).
>
> I'm also in favor of prescribing wordnets for individual languages as 
> valid selector domains as you suggest in option 1). That would make 
> validation easier since we have a known domain.
>
> @All: Can we come up with a second implementation for this use case, 
> preferrably a consumer?
>
> -- Tadej
>
>
>
>
> On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote:
>
> Dear all,
>
> Sorry for the delay. I tried to contact some people I think can 
> contribute to this, but they are not available these weeks.
>
> Before providing an example to consider all if it is worthwhile to 
> maintain “semantic selector” attribute in the consolidation of 
> “Disambiguation” I would like to do a couple considerations:
>
>  1. Probably we will not have short term any implementation, but there
>     are for example few semantic networks available in web (see
>     http://www.globalwordnet.org/gwa/wordnet_table.html) that could be
>     mapped using semantic selectors. See on line for example, the
>     famous http://wordnetweb.princeton.edu
>     <http://wordnetweb.princeton.edu/perl/webwn>).
>  2. The W3C working group SKOS (Simple Knowledge Organization System
>     Reference) are maybe dealing with similar things.
>
> The “semántica selector” allows further lexical (simple words or multi 
> words) distinctions than a “domain” or an ontology like NERD. Also, 
> the denotation is different from the “concept reference”, most of all 
> in part of speech like verbs.
>
> Within the same domain, referring to very similar concepts, languages 
> have semantic differences. Depending on the semantic theory used, each 
> tries to captivate these differences by means of different systems 
> (semantic features, semantic primitives, semantic nodes (in semantic 
> networks), other semantic representations). An example could be the 
> German verb “löschen”, which in different contexts can take different 
> meanings that can be try to capture using different selectors, with 
> the different systems.
>
> –löschen                        -> clear             (some bits)
>                                    -> delete           (files)
>                                    -> cancel          (programs)
>                                    -> erase            (a scratchpad)
>                                    -> extinguish     (a fire)
>
> Other possible translations of the verb**“löschen” are:
>
> delete
>
> 	
>
> löschen, streichen, tilgen, ausstreichen, herausstreichen
>
> clear
>
> 	
>
> löschen, klären, klarmachen, leeren, räumen, säubern
>
> erase
>
> 	
>
> löschen, auslöschen, tilgen, ausradieren, radieren, abwischen
>
> extinguish
>
> 	
>
> löschen, auslöschen, zerstören
>
> quench
>
> 	
>
> löschen, stillen, abschrecken, dämpfen
>
> put out
>
> 	
>
> löschen, bringen, ausmachen, ausschalten, treiben, verstimmen
>
> unload
>
> 	
>
> entladen, abladen, ausladen, löschen, abstoßen, abwälzen
>
> discharge
>
> 	
>
> entladen, erfüllen, entlassen, entlasten, löschen, ausstoßen
>
> wipe out
>
> 	
>
> auslöschen, löschen, ausrotten, tilgen, zunichte machen, auswischen
>
> slake
>
> 	
>
> stillen, löschen
>
> close
>
> 	
>
> schließen, verschließen, abschließen, sperren, zumachen, löschen
>
> blot
>
> 	
>
> löschen, abtupfen, klecksen, beklecksen, sich unmöglich machen, sich 
> verderben
>
> turn off
>
> 	
>
> ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen
>
> blow out
>
> 	
>
> auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen
>
> zap
>
> 	
>
> abknallen, düsen, umschalten, löschen, töten, kaputtmachen
>
> redeem
>
> 	
>
> einlösen, erlösen, zurückkaufen, tilgen, retten, löschen
>
> pay off
>
> 	
>
> auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen
>
> switch out
>
> 	
>
> löschen
>
> unship
>
> 	
>
> ausladen, entladen, abnehmen, löschen
>
> souse
>
> 	
>
> eintauchen, durchtränken, löschen, nass machen
>
> rub off
>
> 	
>
> abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen
>
> strike off
>
> 	
>
> löschen
>
> land
>
> 	
>
> landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, löschen
>
> According to this, the consolidation of disambiguation/namedEntity/ 
>  data categories under “Terminology” 
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation 
> could be the following. It is thought to cover operational URI or 
> XPath pointers to the current three most important semantic resources: 
> conceptual (ontology), semantic (semantic networks or lexical 
> databases) and terminological (glossaries and terminological 
> resources), where ontologies are used for both general lexicon and 
> terminology, semantic networks to represent general vocabulary 
> (lexicon), and terminological resources specialized vocabulary.
>
> *disambiguation*
>
> Includes data to be used by MT systems in disambiguating difficult content
>
> *Data model*
>
>   * concept reference: points to a *concept in an ontology* that this
>     fragment of text represents. May be an URI or an XPath pointer.
>   * semantic selector: points to a *meaning in an semantic network*
>     that this fragment of text represents. May be an URI or an XPath
>     pointer.
>   * terminology reference: points to *a term in a terminological
>     resource* that this fragment of text represents. May be an URI or
>     an XPath pointer.
>   * equivalent translation: expressions of that concept in other
>     languages, for example for training MT systems
>
> Also, I would keep *textAnalysisAnnotation*, since the purpose is 
> quite different.
>
> Anyway, if we consider not to include “semantic selector” now, maybe 
> it can be for future versions or to be treated in liaison with other 
> groups.
>
> I hope it helps,
>
> Pedro
>
> *__________________________________*
>
> **
>
> *Pedro L. Díez Orzas*
>
> *Presidente Ejecutivo/CEO*
>
> *Linguaserve Internacionalización de Servicios, S.A.*
>
> *Tel.: +34 91 761 64 60 <tel:%2B34%2091%20761%2064%2060>
> Fax: +34 91 542 89 28 <tel:%2B34%2091%20542%2089%2028> *
>
> *E-mail: **pedro.diez@linguaserve.com <mailto:pedro.diez@linguaserve.com>*
>
> *www.linguaserve.com <http://www.linguaserve.com/>*
>
> **
>
> «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley 
> 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y 
> Comercio Electrónico, le informamos que procederemos al archivo y 
> tratamiento de sus datos exclusivamente con fines de promoción de los 
> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN 
> DE SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al 
> archivo y tratamiento de los datos proporcionados, o no deseen recibir 
> comunicaciones comerciales sobre los productos y servicios ofrecidos, 
> comuníquenoslo a clients@linguaserve.com 
> <mailto:clients@linguaserve.com>, y su petición será inmediatamente 
> cumplida.»
>
> "According to the provisions set forth in articles 21 and 22 of Law 
> 34/2002 of July 11 regarding Information Society and eCommerce 
> Services, we will store and use your personal data with the sole 
> purpose of marketing the products and services offered by LINGUASERVE 
> INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your 
> personal data to be stored and handled, or you do not wish to receive 
> further information regarding products and services offered by our 
> company, please e-mail us to clients@linguaserve.com 
> <mailto:clients@linguaserve.com>. Your request will be processed 
> immediately."
>
> *____________________________________*
>
>
>
> -- 
> Felix Sasaki
>
> DFKI / W3C Fellow
>
Received on Thursday, 7 June 2012 12:43:22 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:56 UTC