- From: Tadej Štajner <tadej.stajner@ijs.si>
- Date: Thu, 07 Jun 2012 14:42:51 +0200
- To: public-multilingualweb-lt@w3.org
- Message-ID: <4FD0A1CB.2010205@ijs.si>
Hi, I agree with Pedro on the questions. Automatic word sense disambiguation is in practice still not perfect, so some semi-automatic user interfaces make a lot of sense. And how I think that this could look like in a made-up example, answering Felix's 1) and 2): 1) HTML+ITS: <span its-disambiguation its-semantic-network-ref="http://www.sfs.uni-tuebingen.de/lsd/index.shtml" its-selector="#synset_loschen_3">löschen</span> 2) Markup in raw ITS <its:disambiguation semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml" selector="#synset_loschen_3">löschen</its:disambiguation> -- Tadej On 04. 06. 2012 13:53, Pedro L. Díez Orzas wrote: > > Dear Felix, > > Thank you very much. Probably Tadej can prepare the use cases you > mention, with the consolidated data category. About the question 3 and > 4, I can tell you the following: > > 3) Would it be produced also by an automatic text annotation tool? > > For the pointers to the three information referred (concepts in > Ontology, meanings in Lexical DB, and terms in Terminological > resources) I think it would be possible semiautomatic annotation > tools, that is, proposed by the tool and confirmed by user. > > The fully automatic text annotation would need more sophisticate > “semantic calculus”, and most of these are under research, as far as I > know. Maybe, in this cases, it should be combined with > textAnalysisAnnotation, specifying in *Annotation agent* – and > *Confidence score* – which systemand with which reliability has been > produced. > > 4) Would 1-2 be consumed by an MT tool, or by other tools? > > These can be basically consumed by language processing tools, like MT, > and other Linguistic Technology that needs content or semantic info. > For instance Text Analytics, Semantic search, etc.. In the > localization chains, these information can be also used by automatic > or semiautomatic processes (like selection of dictionaries for > translations, or selection of translators/revisers by subject area) > > It could be also used by humans for translation or post-edition in > case of ambiguity or lake of context in the content, but mostly by > automatic systems. > > I hope this helps. > > Pedro > > ------------------------------------------------------------------------ > > *De:*Felix Sasaki [mailto:fsasaki@w3.org] > *Enviado el:* sábado, 02 de junio de 2012 14:13 > *Para:* Tadej Stajner; pedro.diez > *CC:* public-multilingualweb-lt@w3.org > *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology > (semantic features of terms as opposed to domain type ontologies) > > Hi Tadej, Pedro, all, > > this looks like a great chain of producing and consuming metadata. > > Apologies if this was explained during last weeks call or before, but > can you clarify a bit the following: > > 1) How would the actual HTML markup produced in the original text > annotation use case look like? > > 2) How would the markup in this use case look like? > > 3) Would it be produced also by an automatic text annotation tool? > > 4) Would 1-2 be consumed by an MT tool, or by other tools? > > Thanks again, > > Felix > > 2012/5/31 Tadej Stajner <tadej.stajner@ijs.si > <mailto:tadej.stajner@ijs.si>> > > Hi Pedro, > thanks for the excellent explanation. If I understand you correctly, a > sufficient example for this use case would be annotation of individual > words with synset URI of the appropriate wordnet? If so, then I > believe this route can be practical - I think linking to the synset is > a more practical idea than expressing semantic features of the word > given the available tools. > > Enrycher can do automatic all-word disambiguation into the english > wordnet, whereas we don't have anything specific in place for > semantic features (which I suspect also holds for other text analytics > providers). > > I'm also in favor of prescribing wordnets for individual languages as > valid selector domains as you suggest in option 1). That would make > validation easier since we have a known domain. > > @All: Can we come up with a second implementation for this use case, > preferrably a consumer? > > -- Tadej > > > > > On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote: > > Dear all, > > Sorry for the delay. I tried to contact some people I think can > contribute to this, but they are not available these weeks. > > Before providing an example to consider all if it is worthwhile to > maintain “semantic selector” attribute in the consolidation of > “Disambiguation” I would like to do a couple considerations: > > 1. Probably we will not have short term any implementation, but there > are for example few semantic networks available in web (see > http://www.globalwordnet.org/gwa/wordnet_table.html) that could be > mapped using semantic selectors. See on line for example, the > famous http://wordnetweb.princeton.edu > <http://wordnetweb.princeton.edu/perl/webwn>). > 2. The W3C working group SKOS (Simple Knowledge Organization System > Reference) are maybe dealing with similar things. > > The “semántica selector” allows further lexical (simple words or multi > words) distinctions than a “domain” or an ontology like NERD. Also, > the denotation is different from the “concept reference”, most of all > in part of speech like verbs. > > Within the same domain, referring to very similar concepts, languages > have semantic differences. Depending on the semantic theory used, each > tries to captivate these differences by means of different systems > (semantic features, semantic primitives, semantic nodes (in semantic > networks), other semantic representations). An example could be the > German verb “löschen”, which in different contexts can take different > meanings that can be try to capture using different selectors, with > the different systems. > > –löschen -> clear (some bits) > -> delete (files) > -> cancel (programs) > -> erase (a scratchpad) > -> extinguish (a fire) > > Other possible translations of the verb**“löschen” are: > > delete > > > > löschen, streichen, tilgen, ausstreichen, herausstreichen > > clear > > > > löschen, klären, klarmachen, leeren, räumen, säubern > > erase > > > > löschen, auslöschen, tilgen, ausradieren, radieren, abwischen > > extinguish > > > > löschen, auslöschen, zerstören > > quench > > > > löschen, stillen, abschrecken, dämpfen > > put out > > > > löschen, bringen, ausmachen, ausschalten, treiben, verstimmen > > unload > > > > entladen, abladen, ausladen, löschen, abstoßen, abwälzen > > discharge > > > > entladen, erfüllen, entlassen, entlasten, löschen, ausstoßen > > wipe out > > > > auslöschen, löschen, ausrotten, tilgen, zunichte machen, auswischen > > slake > > > > stillen, löschen > > close > > > > schließen, verschließen, abschließen, sperren, zumachen, löschen > > blot > > > > löschen, abtupfen, klecksen, beklecksen, sich unmöglich machen, sich > verderben > > turn off > > > > ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen > > blow out > > > > auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen > > zap > > > > abknallen, düsen, umschalten, löschen, töten, kaputtmachen > > redeem > > > > einlösen, erlösen, zurückkaufen, tilgen, retten, löschen > > pay off > > > > auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen > > switch out > > > > löschen > > unship > > > > ausladen, entladen, abnehmen, löschen > > souse > > > > eintauchen, durchtränken, löschen, nass machen > > rub off > > > > abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen > > strike off > > > > löschen > > land > > > > landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, löschen > > According to this, the consolidation of disambiguation/namedEntity/ > data categories under “Terminology” > http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation > could be the following. It is thought to cover operational URI or > XPath pointers to the current three most important semantic resources: > conceptual (ontology), semantic (semantic networks or lexical > databases) and terminological (glossaries and terminological > resources), where ontologies are used for both general lexicon and > terminology, semantic networks to represent general vocabulary > (lexicon), and terminological resources specialized vocabulary. > > *disambiguation* > > Includes data to be used by MT systems in disambiguating difficult content > > *Data model* > > * concept reference: points to a *concept in an ontology* that this > fragment of text represents. May be an URI or an XPath pointer. > * semantic selector: points to a *meaning in an semantic network* > that this fragment of text represents. May be an URI or an XPath > pointer. > * terminology reference: points to *a term in a terminological > resource* that this fragment of text represents. May be an URI or > an XPath pointer. > * equivalent translation: expressions of that concept in other > languages, for example for training MT systems > > Also, I would keep *textAnalysisAnnotation*, since the purpose is > quite different. > > Anyway, if we consider not to include “semantic selector” now, maybe > it can be for future versions or to be treated in liaison with other > groups. > > I hope it helps, > > Pedro > > *__________________________________* > > ** > > *Pedro L. Díez Orzas* > > *Presidente Ejecutivo/CEO* > > *Linguaserve Internacionalización de Servicios, S.A.* > > *Tel.: +34 91 761 64 60 <tel:%2B34%2091%20761%2064%2060> > Fax: +34 91 542 89 28 <tel:%2B34%2091%20542%2089%2028> * > > *E-mail: **pedro.diez@linguaserve.com <mailto:pedro.diez@linguaserve.com>* > > *www.linguaserve.com <http://www.linguaserve.com/>* > > ** > > «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley > 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y > Comercio Electrónico, le informamos que procederemos al archivo y > tratamiento de sus datos exclusivamente con fines de promoción de los > productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN > DE SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al > archivo y tratamiento de los datos proporcionados, o no deseen recibir > comunicaciones comerciales sobre los productos y servicios ofrecidos, > comuníquenoslo a clients@linguaserve.com > <mailto:clients@linguaserve.com>, y su petición será inmediatamente > cumplida.» > > "According to the provisions set forth in articles 21 and 22 of Law > 34/2002 of July 11 regarding Information Society and eCommerce > Services, we will store and use your personal data with the sole > purpose of marketing the products and services offered by LINGUASERVE > INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your > personal data to be stored and handled, or you do not wish to receive > further information regarding products and services offered by our > company, please e-mail us to clients@linguaserve.com > <mailto:clients@linguaserve.com>. Your request will be processed > immediately." > > *____________________________________* > > > > -- > Felix Sasaki > > DFKI / W3C Fellow >
Received on Thursday, 7 June 2012 12:43:22 UTC