- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 7 Jun 2012 16:19:02 +0200
- To: Tadej Stajner <tadej.stajner@ijs.si>
- Cc: public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czp-LXNeoBAbCeR=E_=iV6iOqH28Sk_QtR7f6CudHmUFRg@mail.gmail.com>
2012/6/7 Tadej Stajner <tadej.stajner@ijs.si> > Hi Felix, > as far as I'm aware, URIs only exist for the English wordnet. Maybe > prefixing the a # was not the best stylistic choice here, but yes, what I > meant to convey is that that value was a local identifier, valid within a > particular semantic network. > > In the ideal scenario, these selectors would be dereferencible and > verifiable via URIs for arbitrary wordnets and terminology lexicons and > their entries. > OK - the main point would be that they are dereferencible and verifiable. In practice, you will not achieve that for arbitrary wordnets, but you can achieve that for a subset, if the related "players" agree. In the "collation" example mentioned before, the identifier for the Unicode code point based collation http://www.w3.org/2005/xpath-functions/collation/codepoint/ was the lowest common dominator; in addition to that everybody is free to have other URIs for arbitrary collations. I would hope that we could end up with such a list (hopefully longer than one) for the semantic networks too. Felix > Do we have any people involved in developing semantic networks or term > lexicons on this list? The compromise is allowing some limited classes of > non-URI local selectors, like synset IDs for wordnets, and term IDs for TBX > lexicons. > > -- Tadej > > > On 6/7/2012 3:44 PM, Felix Sasaki wrote: > > Thanks, Tadej. > > The value of the its-selector attribute looks like a document internal > link. But it is probably an identifier of the synset in the given semantic > network, no? > > About 1) and 2): is your made-up example then the output of the text > annotation use case? I am asking since you say "2) markup in raw ITS", so > I'm not sure. > > Also, it seems that an implementation needs to "know" about the > resources that are identified via its-semantic-network-ref. This is really > an identifier, like > http://www.w3.org/2005/xpath-functions/collation/codepoint/ > is an identifier for a Unicode code point collation; it doesn't give you > the collation data, but creating an implementation that "understands" the > identifier means probably caching the collation data. The same would be > true for the semantic network. > > This leads to the next question: can we engage the developers of the > semantic network (or other disambiguation related) resources to come up > with stable URIs for these? It would be great to list these URIs in our > specification and say "this is how you identify the English wordnet etc.", > for scenarios like the collation data mentioned above. > > Felix > > 2012/6/7 Tadej Štajner <tadej.stajner@ijs.si> > >> Hi, >> >> I agree with Pedro on the questions. Automatic word sense disambiguation >> is in practice still not perfect, so some semi-automatic user interfaces >> make a lot of sense. And how I think that this could look like in a made-up >> example, answering Felix's 1) and 2): >> >> 1) HTML+ITS: <span its-disambiguation its-semantic-network-ref= >> "http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>its-selector="#synset_loschen_3">löschen</span> >> >> 2) Markup in raw ITS >> <its:disambiguation >> semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml> >> selector="#synset_loschen_3">löschen</its:disambiguation> >> >> -- Tadej >> >> >> >> On 04. 06. 2012 13:53, Pedro L. Díez Orzas wrote: >> >> Dear Felix, >> >> >> >> Thank you very much. Probably Tadej can prepare the use cases you >> mention, with the consolidated data category. About the question 3 and 4, I >> can tell you the following: >> >> >> >> 3) Would it be produced also by an automatic text annotation tool? >> >> >> >> For the pointers to the three information referred (concepts in Ontology, >> meanings in Lexical DB, and terms in Terminological resources) I think it >> would be possible semiautomatic annotation tools, that is, proposed by the >> tool and confirmed by user. >> >> >> >> The fully automatic text annotation would need more sophisticate >> “semantic calculus”, and most of these are under research, as far as I >> know. Maybe, in this cases, it should be combined with textAnalysisAnnotation, >> specifying in *Annotation agent* – and *Confidence score* – which systemand with which reliability has been produced. >> >> >> >> 4) Would 1-2 be consumed by an MT tool, or by other tools? >> >> >> >> These can be basically consumed by language processing tools, like MT, >> and other Linguistic Technology that needs content or semantic info. For >> instance Text Analytics, Semantic search, etc.. In the localization chains, >> these information can be also used by automatic or semiautomatic processes >> (like selection of dictionaries for translations, or selection of >> translators/revisers by subject area) >> >> >> >> It could be also used by humans for translation or post-edition in case >> of ambiguity or lake of context in the content, but mostly by automatic >> systems. >> >> >> >> I hope this helps. >> >> Pedro >> >> >> ------------------------------ >> >> *De:* Felix Sasaki [mailto:fsasaki@w3.org <fsasaki@w3.org>] >> *Enviado el:* sábado, 02 de junio de 2012 14:13 >> *Para:* Tadej Stajner; pedro.diez >> *CC:* public-multilingualweb-lt@w3.org >> *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology >> (semantic features of terms as opposed to domain type ontologies) >> >> >> >> Hi Tadej, Pedro, all, >> >> >> >> this looks like a great chain of producing and consuming metadata. >> >> >> >> Apologies if this was explained during last weeks call or before, but can >> you clarify a bit the following: >> >> >> >> 1) How would the actual HTML markup produced in the original text >> annotation use case look like? >> >> 2) How would the markup in this use case look like? >> >> 3) Would it be produced also by an automatic text annotation tool? >> >> 4) Would 1-2 be consumed by an MT tool, or by other tools? >> >> >> >> Thanks again, >> >> >> >> Felix >> >> 2012/5/31 Tadej Stajner <tadej.stajner@ijs.si> >> >> Hi Pedro, >> thanks for the excellent explanation. If I understand you correctly, a >> sufficient example for this use case would be annotation of individual >> words with synset URI of the appropriate wordnet? If so, then I believe >> this route can be practical - I think linking to the synset is a more >> practical idea than expressing semantic features of the word given the >> available tools. >> >> Enrycher can do automatic all-word disambiguation into the english >> wordnet, whereas we don't have anything specific in place for semantic >> features (which I suspect also holds for other text analytics providers). >> >> I'm also in favor of prescribing wordnets for individual languages as >> valid selector domains as you suggest in option 1). That would make >> validation easier since we have a known domain. >> >> @All: Can we come up with a second implementation for this use case, >> preferrably a consumer? >> >> -- Tadej >> >> >> >> >> On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote: >> >> Dear all, >> >> >> >> Sorry for the delay. I tried to contact some people I think can >> contribute to this, but they are not available these weeks. >> >> >> >> Before providing an example to consider all if it is worthwhile to >> maintain “semantic selector” attribute in the consolidation of >> “Disambiguation” I would like to do a couple considerations: >> >> >> >> 1. Probably we will not have short term any implementation, but there >> are for example few semantic networks available in web (see >> http://www.globalwordnet.org/gwa/wordnet_table.html) that could be >> mapped using semantic selectors. See on line for example, the famous >> http://wordnetweb.princeton.edu<http://wordnetweb.princeton.edu/perl/webwn> >> ). >> 2. The W3C working group SKOS (Simple Knowledge Organization System >> Reference) are maybe dealing with similar things. >> >> >> >> The “semántica selector” allows further lexical (simple words or multi >> words) distinctions than a “domain” or an ontology like NERD. Also, the >> denotation is different from the “concept reference”, most of all in part >> of speech like verbs. >> >> >> >> Within the same domain, referring to very similar concepts, languages >> have semantic differences. Depending on the semantic theory used, each >> tries to captivate these differences by means of different systems >> (semantic features, semantic primitives, semantic nodes (in semantic >> networks), other semantic representations). An example could be the German >> verb “löschen”, which in different contexts can take different meanings >> that can be try to capture using different selectors, with the different >> systems. >> >> >> >> – löschen -> clear (some >> bits) >> -> delete (files) >> -> cancel (programs) >> -> erase (a scratchpad) >> -> extinguish (a fire) >> >> >> >> Other possible translations of the verb* *“löschen” are: >> >> delete >> >> löschen, streichen, tilgen, ausstreichen, herausstreichen >> >> clear >> >> löschen, klären, klarmachen, leeren, räumen, säubern >> >> erase >> >> löschen, auslöschen, tilgen, ausradieren, radieren, abwischen >> >> extinguish >> >> löschen, auslöschen, zerstören >> >> quench >> >> löschen, stillen, abschrecken, dämpfen >> >> put out >> >> löschen, bringen, ausmachen, ausschalten, treiben, verstimmen >> >> unload >> >> entladen, abladen, ausladen, löschen, abstoßen, abwälzen >> >> discharge >> >> entladen, erfüllen, entlassen, entlasten, löschen, ausstoßen >> >> wipe out >> >> auslöschen, löschen, ausrotten, tilgen, zunichte machen, auswischen >> >> slake >> >> stillen, löschen >> >> close >> >> schließen, verschließen, abschließen, sperren, zumachen, löschen >> >> blot >> >> löschen, abtupfen, klecksen, beklecksen, sich unmöglich machen, sich >> verderben >> >> turn off >> >> ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen >> >> blow out >> >> auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen >> >> zap >> >> abknallen, düsen, umschalten, löschen, töten, kaputtmachen >> >> redeem >> >> einlösen, erlösen, zurückkaufen, tilgen, retten, löschen >> >> pay off >> >> auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen >> >> switch out >> >> löschen >> >> unship >> >> ausladen, entladen, abnehmen, löschen >> >> souse >> >> eintauchen, durchtränken, löschen, nass machen >> >> rub off >> >> abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen >> >> strike off >> >> löschen >> >> land >> >> landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, löschen >> >> >> >> >> >> >> >> According to this, the consolidation of disambiguation/namedEntity/ data >> categories under “Terminology” >> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguationcould be the following. It is thought to cover operational URI or XPath >> pointers to the current three most important semantic resources: conceptual >> (ontology), semantic (semantic networks or lexical databases) and >> terminological (glossaries and terminological resources), where ontologies >> are used for both general lexicon and terminology, semantic networks to >> represent general vocabulary (lexicon), and terminological resources >> specialized vocabulary. >> >> >> >> *disambiguation* >> >> Includes data to be used by MT systems in disambiguating difficult content >> >> >> >> *Data model* >> >> - concept reference: points to a *concept in an ontology* that this >> fragment of text represents. May be an URI or an XPath pointer. >> - semantic selector: points to a *meaning in an semantic network*that this fragment of text represents. May be an URI or an XPath pointer. >> - terminology reference: points to *a term in a terminological >> resource* that this fragment of text represents. May be an URI or an >> XPath pointer. >> - equivalent translation: expressions of that concept in other >> languages, for example for training MT systems >> >> >> >> >> >> Also, I would keep *textAnalysisAnnotation*, since the purpose is quite >> different. >> >> >> >> Anyway, if we consider not to include “semantic selector” now, maybe it >> can be for future versions or to be treated in liaison with other groups. >> >> >> >> I hope it helps, >> >> Pedro >> >> >> >> *__________________________________* >> >> * * >> >> *Pedro L. Díez Orzas* >> >> *Presidente Ejecutivo/CEO* >> >> *Linguaserve Internacionalización de Servicios, S.A.* >> >> *Tel.: +34 91 761 64 60 <%2B34%2091%20761%2064%2060> >> Fax: +34 91 542 89 28 <%2B34%2091%20542%2089%2028> * >> >> *E-mail: **pedro.diez@linguaserve.com* >> >> *www.linguaserve.com* >> >> * * >> >> «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley >> 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y >> Comercio Electrónico, le informamos que procederemos al archivo y >> tratamiento de sus datos exclusivamente con fines de promoción de los >> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE >> SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y >> tratamiento de los datos proporcionados, o no deseen recibir comunicaciones >> comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a >> clients@linguaserve.com, y su petición será inmediatamente cumplida.» >> >> >> >> "According to the provisions set forth in articles 21 and 22 of Law >> 34/2002 of July 11 regarding Information Society and eCommerce Services, we >> will store and use your personal data with the sole purpose of marketing >> the products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE >> SERVICIOS, S.A. If you do not wish your personal data to be stored and >> handled, or you do not wish to receive further information regarding >> products and services offered by our company, please e-mail us to >> clients@linguaserve.com. Your request will be processed immediately." >> >> *____________________________________* >> >> >> >> >> >> >> >> >> >> >> >> -- >> Felix Sasaki >> >> DFKI / W3C Fellow >> >> >> >> >> > > > -- > Felix Sasaki > DFKI / W3C Fellow > > > -- Felix Sasaki DFKI / W3C Fellow
Received on Thursday, 7 June 2012 14:19:34 UTC