- From: Tadej Štajner <tadej.stajner@ijs.si>
- Date: Thu, 07 Jun 2012 14:42:51 +0200
- To: public-multilingualweb-lt@w3.org
- Message-ID: <4FD0A1CB.2010205@ijs.si>
Hi,
I agree with Pedro on the questions. Automatic word sense disambiguation
is in practice still not perfect, so some semi-automatic user interfaces
make a lot of sense. And how I think that this could look like in a
made-up example, answering Felix's 1) and 2):
1) HTML+ITS: <span its-disambiguation
its-semantic-network-ref="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"
its-selector="#synset_loschen_3">löschen</span>
2) Markup in raw ITS
<its:disambiguation
semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"
selector="#synset_loschen_3">löschen</its:disambiguation>
-- Tadej
On 04. 06. 2012 13:53, Pedro L. Díez Orzas wrote:
>
> Dear Felix,
>
> Thank you very much. Probably Tadej can prepare the use cases you
> mention, with the consolidated data category. About the question 3 and
> 4, I can tell you the following:
>
> 3) Would it be produced also by an automatic text annotation tool?
>
> For the pointers to the three information referred (concepts in
> Ontology, meanings in Lexical DB, and terms in Terminological
> resources) I think it would be possible semiautomatic annotation
> tools, that is, proposed by the tool and confirmed by user.
>
> The fully automatic text annotation would need more sophisticate
> “semantic calculus”, and most of these are under research, as far as I
> know. Maybe, in this cases, it should be combined with
> textAnalysisAnnotation, specifying in *Annotation agent* – and
> *Confidence score* – which systemand with which reliability has been
> produced.
>
> 4) Would 1-2 be consumed by an MT tool, or by other tools?
>
> These can be basically consumed by language processing tools, like MT,
> and other Linguistic Technology that needs content or semantic info.
> For instance Text Analytics, Semantic search, etc.. In the
> localization chains, these information can be also used by automatic
> or semiautomatic processes (like selection of dictionaries for
> translations, or selection of translators/revisers by subject area)
>
> It could be also used by humans for translation or post-edition in
> case of ambiguity or lake of context in the content, but mostly by
> automatic systems.
>
> I hope this helps.
>
> Pedro
>
> ------------------------------------------------------------------------
>
> *De:*Felix Sasaki [mailto:fsasaki@w3.org]
> *Enviado el:* sábado, 02 de junio de 2012 14:13
> *Para:* Tadej Stajner; pedro.diez
> *CC:* public-multilingualweb-lt@w3.org
> *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology
> (semantic features of terms as opposed to domain type ontologies)
>
> Hi Tadej, Pedro, all,
>
> this looks like a great chain of producing and consuming metadata.
>
> Apologies if this was explained during last weeks call or before, but
> can you clarify a bit the following:
>
> 1) How would the actual HTML markup produced in the original text
> annotation use case look like?
>
> 2) How would the markup in this use case look like?
>
> 3) Would it be produced also by an automatic text annotation tool?
>
> 4) Would 1-2 be consumed by an MT tool, or by other tools?
>
> Thanks again,
>
> Felix
>
> 2012/5/31 Tadej Stajner <tadej.stajner@ijs.si
> <mailto:tadej.stajner@ijs.si>>
>
> Hi Pedro,
> thanks for the excellent explanation. If I understand you correctly, a
> sufficient example for this use case would be annotation of individual
> words with synset URI of the appropriate wordnet? If so, then I
> believe this route can be practical - I think linking to the synset is
> a more practical idea than expressing semantic features of the word
> given the available tools.
>
> Enrycher can do automatic all-word disambiguation into the english
> wordnet, whereas we don't have anything specific in place for
> semantic features (which I suspect also holds for other text analytics
> providers).
>
> I'm also in favor of prescribing wordnets for individual languages as
> valid selector domains as you suggest in option 1). That would make
> validation easier since we have a known domain.
>
> @All: Can we come up with a second implementation for this use case,
> preferrably a consumer?
>
> -- Tadej
>
>
>
>
> On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote:
>
> Dear all,
>
> Sorry for the delay. I tried to contact some people I think can
> contribute to this, but they are not available these weeks.
>
> Before providing an example to consider all if it is worthwhile to
> maintain “semantic selector” attribute in the consolidation of
> “Disambiguation” I would like to do a couple considerations:
>
> 1. Probably we will not have short term any implementation, but there
> are for example few semantic networks available in web (see
> http://www.globalwordnet.org/gwa/wordnet_table.html) that could be
> mapped using semantic selectors. See on line for example, the
> famous http://wordnetweb.princeton.edu
> <http://wordnetweb.princeton.edu/perl/webwn>).
> 2. The W3C working group SKOS (Simple Knowledge Organization System
> Reference) are maybe dealing with similar things.
>
> The “semántica selector” allows further lexical (simple words or multi
> words) distinctions than a “domain” or an ontology like NERD. Also,
> the denotation is different from the “concept reference”, most of all
> in part of speech like verbs.
>
> Within the same domain, referring to very similar concepts, languages
> have semantic differences. Depending on the semantic theory used, each
> tries to captivate these differences by means of different systems
> (semantic features, semantic primitives, semantic nodes (in semantic
> networks), other semantic representations). An example could be the
> German verb “löschen”, which in different contexts can take different
> meanings that can be try to capture using different selectors, with
> the different systems.
>
> –löschen -> clear (some bits)
> -> delete (files)
> -> cancel (programs)
> -> erase (a scratchpad)
> -> extinguish (a fire)
>
> Other possible translations of the verb**“löschen” are:
>
> delete
>
>
>
> löschen, streichen, tilgen, ausstreichen, herausstreichen
>
> clear
>
>
>
> löschen, klären, klarmachen, leeren, räumen, säubern
>
> erase
>
>
>
> löschen, auslöschen, tilgen, ausradieren, radieren, abwischen
>
> extinguish
>
>
>
> löschen, auslöschen, zerstören
>
> quench
>
>
>
> löschen, stillen, abschrecken, dämpfen
>
> put out
>
>
>
> löschen, bringen, ausmachen, ausschalten, treiben, verstimmen
>
> unload
>
>
>
> entladen, abladen, ausladen, löschen, abstoßen, abwälzen
>
> discharge
>
>
>
> entladen, erfüllen, entlassen, entlasten, löschen, ausstoßen
>
> wipe out
>
>
>
> auslöschen, löschen, ausrotten, tilgen, zunichte machen, auswischen
>
> slake
>
>
>
> stillen, löschen
>
> close
>
>
>
> schließen, verschließen, abschließen, sperren, zumachen, löschen
>
> blot
>
>
>
> löschen, abtupfen, klecksen, beklecksen, sich unmöglich machen, sich
> verderben
>
> turn off
>
>
>
> ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen
>
> blow out
>
>
>
> auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen
>
> zap
>
>
>
> abknallen, düsen, umschalten, löschen, töten, kaputtmachen
>
> redeem
>
>
>
> einlösen, erlösen, zurückkaufen, tilgen, retten, löschen
>
> pay off
>
>
>
> auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen
>
> switch out
>
>
>
> löschen
>
> unship
>
>
>
> ausladen, entladen, abnehmen, löschen
>
> souse
>
>
>
> eintauchen, durchtränken, löschen, nass machen
>
> rub off
>
>
>
> abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen
>
> strike off
>
>
>
> löschen
>
> land
>
>
>
> landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, löschen
>
> According to this, the consolidation of disambiguation/namedEntity/
> data categories under “Terminology”
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation
> could be the following. It is thought to cover operational URI or
> XPath pointers to the current three most important semantic resources:
> conceptual (ontology), semantic (semantic networks or lexical
> databases) and terminological (glossaries and terminological
> resources), where ontologies are used for both general lexicon and
> terminology, semantic networks to represent general vocabulary
> (lexicon), and terminological resources specialized vocabulary.
>
> *disambiguation*
>
> Includes data to be used by MT systems in disambiguating difficult content
>
> *Data model*
>
> * concept reference: points to a *concept in an ontology* that this
> fragment of text represents. May be an URI or an XPath pointer.
> * semantic selector: points to a *meaning in an semantic network*
> that this fragment of text represents. May be an URI or an XPath
> pointer.
> * terminology reference: points to *a term in a terminological
> resource* that this fragment of text represents. May be an URI or
> an XPath pointer.
> * equivalent translation: expressions of that concept in other
> languages, for example for training MT systems
>
> Also, I would keep *textAnalysisAnnotation*, since the purpose is
> quite different.
>
> Anyway, if we consider not to include “semantic selector” now, maybe
> it can be for future versions or to be treated in liaison with other
> groups.
>
> I hope it helps,
>
> Pedro
>
> *__________________________________*
>
> **
>
> *Pedro L. Díez Orzas*
>
> *Presidente Ejecutivo/CEO*
>
> *Linguaserve Internacionalización de Servicios, S.A.*
>
> *Tel.: +34 91 761 64 60 <tel:%2B34%2091%20761%2064%2060>
> Fax: +34 91 542 89 28 <tel:%2B34%2091%20542%2089%2028> *
>
> *E-mail: **pedro.diez@linguaserve.com <mailto:pedro.diez@linguaserve.com>*
>
> *www.linguaserve.com <http://www.linguaserve.com/>*
>
> **
>
> «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley
> 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y
> Comercio Electrónico, le informamos que procederemos al archivo y
> tratamiento de sus datos exclusivamente con fines de promoción de los
> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN
> DE SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al
> archivo y tratamiento de los datos proporcionados, o no deseen recibir
> comunicaciones comerciales sobre los productos y servicios ofrecidos,
> comuníquenoslo a clients@linguaserve.com
> <mailto:clients@linguaserve.com>, y su petición será inmediatamente
> cumplida.»
>
> "According to the provisions set forth in articles 21 and 22 of Law
> 34/2002 of July 11 regarding Information Society and eCommerce
> Services, we will store and use your personal data with the sole
> purpose of marketing the products and services offered by LINGUASERVE
> INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your
> personal data to be stored and handled, or you do not wish to receive
> further information regarding products and services offered by our
> company, please e-mail us to clients@linguaserve.com
> <mailto:clients@linguaserve.com>. Your request will be processed
> immediately."
>
> *____________________________________*
>
>
>
> --
> Felix Sasaki
>
> DFKI / W3C Fellow
>
Received on Thursday, 7 June 2012 12:43:22 UTC