- From: Tadej Stajner <tadej.stajner@ijs.si>
- Date: Thu, 31 May 2012 15:52:33 +0200
- To: public-multilingualweb-lt@w3.org
- Message-ID: <4FC777A1.307@ijs.si>
Hi Pedro, thanks for the excellent explanation. If I understand you correctly, a sufficient example for this use case would be annotation of individual words with synset URI of the appropriate wordnet? If so, then I believe this route can be practical - I think linking to the synset is a more practical idea than expressing semantic features of the word given the available tools. Enrycher can do automatic all-word disambiguation into the english wordnet, whereas we don't have anything specific in place for semantic features (which I suspect also holds for other text analytics providers). I'm also in favor of prescribing wordnets for individual languages as valid selector domains as you suggest in option 1). That would make validation easier since we have a known domain. @All: Can we come up with a second implementation for this use case, preferrably a consumer? -- Tadej On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote: > > Dear all, > > Sorry for the delay. I tried to contact some people I think can > contribute to this, but they are not available these weeks. > > Before providing an example to consider all if it is worthwhile to > maintain "semantic selector" attribute in the consolidation of > "Disambiguation" I would like to do a couple considerations: > > 1. Probably we will not have short term any implementation, but there > are for example few semantic networks available in web (see > http://www.globalwordnet.org/gwa/wordnet_table.html) that could be > mapped using semantic selectors. See on line for example, the > famous http://wordnetweb.princeton.edu > <http://wordnetweb.princeton.edu/perl/webwn>). > 2. The W3C working group SKOS (Simple Knowledge Organization System > Reference) are maybe dealing with similar things. > > The "semántica selector" allows further lexical (simple words or multi > words) distinctions than a "domain" or an ontology like NERD. Also, > the denotation is different from the "concept reference", most of all > in part of speech like verbs. > > Within the same domain, referring to very similar concepts, languages > have semantic differences. Depending on the semantic theory used, each > tries to captivate these differences by means of different systems > (semantic features, semantic primitives, semantic nodes (in semantic > networks), other semantic representations). An example could be the > German verb "löschen", which in different contexts can take different > meanings that can be try to capture using different selectors, with > the different systems. > > --löschen -> clear (some bits) > -> delete (files) > -> cancel (programs) > -> erase (a scratchpad) > -> extinguish (a fire) > > Other possible translations of the verb**"löschen" are:** > > delete > > > > löschen, streichen, tilgen, ausstreichen, herausstreichen > > clear > > > > löschen, klären, klarmachen, leeren, räumen, säubern > > erase > > > > löschen, auslöschen, tilgen, ausradieren, radieren, abwischen > > extinguish > > > > löschen, auslöschen, zerstören > > quench > > > > löschen, stillen, abschrecken, dämpfen > > put out > > > > löschen, bringen, ausmachen, ausschalten, treiben, verstimmen > > unload > > > > entladen, abladen, ausladen, löschen, abstoßen, abwälzen > > discharge > > > > entladen, erfüllen, entlassen, entlasten, löschen, ausstoßen > > wipe out > > > > auslöschen, löschen, ausrotten, tilgen, zunichte machen, auswischen > > slake > > > > stillen, löschen > > close > > > > schließen, verschließen, abschließen, sperren, zumachen, löschen > > blot > > > > löschen, abtupfen, klecksen, beklecksen, sich unmöglich machen, sich > verderben > > turn off > > > > ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen > > blow out > > > > auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen > > zap > > > > abknallen, düsen, umschalten, löschen, töten, kaputtmachen > > redeem > > > > einlösen, erlösen, zurückkaufen, tilgen, retten, löschen > > pay off > > > > auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen > > switch out > > > > löschen > > unship > > > > ausladen, entladen, abnehmen, löschen > > souse > > > > eintauchen, durchtränken, löschen, nass machen > > rub off > > > > abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen > > strike off > > > > löschen > > land > > > > landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, löschen > > According to this, the consolidation of disambiguation/namedEntity/ > data categories under "Terminology" > http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation > could be the following. It is thought to cover operational URI or > XPath pointers to the current three most important semantic resources: > conceptual (ontology), semantic (semantic networks or lexical > databases) and terminological (glossaries and terminological > resources), where ontologies are used for both general lexicon and > terminology, semantic networks to represent general vocabulary > (lexicon), and terminological resources specialized vocabulary. > > *disambiguation* > > Includes data to be used by MT systems in disambiguating difficult content > > *Data model* > > * concept reference: points to a *concept in an ontology* that this > fragment of text represents. May be an URI or an XPath pointer. > * semantic selector: points to a *meaning in an semantic network* > that this fragment of text represents. May be an URI or an XPath > pointer. > * terminology reference: points to *a term in a terminological > resource* that this fragment of text represents. May be an URI or > an XPath pointer. > * equivalent translation: expressions of that concept in other > languages, for example for training MT systems > > Also, I would keep *textAnalysisAnnotation*, since the purpose is > quite different. > > Anyway, if we consider not to include "semantic selector" now, maybe > it can be for future versions or to be treated in liaison with other > groups. > > I hope it helps, > > Pedro > > *__________________________________*** > > ** > > *Pedro L. Díez Orzas* > > *Presidente Ejecutivo/CEO* > > *Linguaserve Internacionalización de Servicios, S.A.* > > *Tel.: +34 91 761 64 60 > Fax: +34 91 542 89 28 * > > *E-mail: **pedro.diez@linguaserve.com > <mailto:pedro.diez@linguaserve.com>*** > > *www.linguaserve.com <http://www.linguaserve.com/>* > > ** > > «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley > 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y > Comercio Electrónico, le informamos que procederemos al archivo y > tratamiento de sus datos exclusivamente con fines de promoción de los > productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN > DE SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al > archivo y tratamiento de los datos proporcionados, o no deseen recibir > comunicaciones comerciales sobre los productos y servicios ofrecidos, > comuníquenoslo a clients@linguaserve.com, y su petición será > inmediatamente cumplida.» > > "According to the provisions set forth in articles 21 and 22 of Law > 34/2002 of July 11 regarding Information Society and eCommerce > Services, we will store and use your personal data with the sole > purpose of marketing the products and services offered by LINGUASERVE > INTERNACIONALIZACIÓN DE SERVICIOS, S.A. If you do not wish your > personal data to be stored and handled, or you do not wish to receive > further information regarding products and services offered by our > company, please e-mail us to clients@linguaserve.com. Your request > will be processed immediately." > > *____________________________________*** >
Received on Thursday, 31 May 2012 13:53:11 UTC