- From: Felix Sasaki <fsasaki@w3.org>
- Date: Sat, 9 Jun 2012 06:44:34 +0200
- To: Pedro L. Dez Orzas <pedro.diez@linguaserve.com>
- Cc: Dave Lewis <dave.lewis@cs.tcd.ie>, public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czqVZqV1B=zHOFrc50A_mYTCtwr51-h=o2m36=AytkeC_Q@mail.gmail.com>
Dear Pedro, thank you for this - for comments see my mail to Dave about this at http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0030.html but a further comment below. I think we will have trouble to agree on types of semantic resources, e.g. the values of onto-concept | sem-net-node | terminology-entry | eqiv-translation e.g. if the resources is a terminology entry, what format is it in (TBX, OLIF, ...). If it is an onto-concept, what ontologica model is behind it? Luckily, http://thedatahub.org/dataset/vu-wordnet contains the information we need. So having something like this <span its-entity entityref="http://www.w3.org/2012/semantic-resources/" its-selector="enwg-synset_loschen_3">löschen</span> with a link from http://www.w3.org/2012/semantic-resources/ to the CKAN page http://thedatahub.org/dataset/vu-wordnet would provide the same information. What do you think? Felix 2012/6/8 Pedro L. Díez Orzas <pedro.diez@linguaserve.com> > ** > > Dear Tadej, Felix, Yves, Dave, all, **** > > ** ** > > I checked with some expert people and told me the following:**** > > ** ** > > *It would be great if links to wordnet can be included in the > annotations. The best thing to do would be to use the open linked data > versions of wordnet:* > > * * > > *http://thedatahub.org/dataset/vu-wordnet*** > > * * > > *It has URIs for synsets (actually sense meanings but I convinced them > they need to shift to synset IDs, which they will do in the near future). > English synsets are good for any language since the other languages link to > English (still as an Inter Lingual Index). Eventually, other wordnets will > also be published as linked open data.* > > * * > > *Another thing is domain tags. WordnetDomain tags are used here (Dewey > system). Since it is linked to English Wordnet it is linked to any synset > in any language linked to English. That will be a very useful semantic tag > also for translation.* > > ** ** > > I think this is a right way to reinforce the connection between MLS-LT and > open linked data. I hope it helps.**** > > ** ** > > Best,**** > > Pedro**** > > ** ** > ------------------------------ > > *De:* Dave Lewis [mailto:dave.lewis@cs.tcd.ie] > *Enviado el:* jueves, 07 de junio de 2012 23:58 > *Para:* **public-multilingualweb-lt@w3.org > ** > *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology > (semantic features of terms as opposed to domain type ontologies) > **** > > ** ** > > Hi Tadej, > I spoke to some people from ISOCAT at LREC. They operate persistent URL > for their platform, so with an example perhaps we could add that to the > list? > > cheers, > Dave > > On 07/06/2012 15:19, Felix Sasaki wrote: **** > > ** ** > > 2012/6/7 Tadej Stajner <tadej.stajner@ijs.si>**** > > Hi Felix, > as far as I'm aware, URIs only exist for the English wordnet. Maybe > prefixing the a # was not the best stylistic choice here, but yes, what I > meant to convey is that that value was a local identifier, valid within a > particular semantic network. > > In the ideal scenario, these selectors would be dereferencible and > verifiable via URIs for arbitrary wordnets and terminology lexicons and > their entries. **** > > ** ** > > ** ** > > OK - the main point would be that they are dereferencible and verifiable. > In practice, you will not achieve that for arbitrary wordnets, but you can > achieve that for a subset, if the related "players" agree. In the > "collation" example mentioned before, the identifier for the Unicode code > point based collation > http://www.w3.org/2005/xpath-functions/collation/codepoint/ was the > lowest common dominator; in addition to that everybody is free to have > other URIs for arbitrary collations. I would hope that we could end up with > such a list (hopefully longer than one) for the semantic networks too.**** > > ** ** > > Felix**** > > ** ** > > **** > > Do we have any people involved in developing semantic networks or term > lexicons on this list? The compromise is allowing some limited classes of > non-URI local selectors, like synset IDs for wordnets, and term IDs for TBX > lexicons. > > -- Tadej **** > > > > On 6/7/2012 3:44 PM, Felix Sasaki wrote: **** > > Thanks, Tadej. **** > > ** ** > > The value of the its-selector attribute looks like a document internal > link. But it is probably an identifier of the synset in the given semantic > network, no?**** > > ** ** > > About 1) and 2): is your made-up example then the output of the text > annotation use case? I am asking since you say "2) markup in raw ITS", so > I'm not sure.**** > > ** ** > > Also, it seems that an implementation needs to "know" about the resources > that are identified via its-semantic-network-ref. This is really an > identifier, like **** > > http://www.w3.org/2005/xpath-functions/collation/codepoint/**** > > is an identifier for a Unicode code point collation; it doesn't give you > the collation data, but creating an implementation that "understands" the > identifier means probably caching the collation data. The same would be > true for the semantic network.**** > > ** ** > > This leads to the next question: can we engage the developers of the > semantic network (or other disambiguation related) resources to come up > with stable URIs for these? It would be great to list these URIs in our > specification and say "this is how you identify the English wordnet etc.", > for scenarios like the collation data mentioned above.**** > > ** ** > > Felix **** > > 2012/6/7 Tadej Štajner <tadej.stajner@ijs.si>**** > > Hi, > > I agree with Pedro on the questions. Automatic word sense disambiguation > is in practice still not perfect, so some semi-automatic user interfaces > make a lot of sense. And how I think that this could look like in a made-up > example, answering Felix's 1) and 2): > > 1) HTML+ITS: <span its-disambiguation its-semantic-network-ref= > "http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>its-selector="#synset_loschen_3">löschen</span> > > 2) Markup in raw ITS > <its:disambiguation > semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml> > selector="#synset_loschen_3">löschen</its:disambiguation> > > -- Tadej **** > > > > > On 04. 06. 2012 13:53, Pedro L. Díez Orzas wrote: **** > > Dear Felix,**** > > **** > > Thank you very much. Probably Tadej can prepare the use cases you mention, > with the consolidated data category. About the question 3 and 4, I can tell > you the following:**** > > **** > > 3) Would it be produced also by an automatic text annotation tool?**** > > **** > > For the pointers to the three information referred (concepts in Ontology, > meanings in Lexical DB, and terms in Terminological resources) I think it > would be possible semiautomatic annotation tools, that is, proposed by the > tool and confirmed by user.**** > > **** > > The fully automatic text annotation would need more sophisticate > €œsemantic calculus€, and most of these are under research, as far as I > know. Maybe, in this cases, it should be combined with > textAnalysisAnnotation, specifying in *Annotation agent* €“ and *Confidence > score* €“ which system and with which reliability has been produced.**** > > **** > > 4) Would 1-2 be consumed by an MT tool, or by other tools?**** > > **** > > These can be basically consumed by language processing tools, like MT, and > other Linguistic Technology that needs content or semantic info. For > instance Text Analytics, Semantic search, etc.. In the localization chains, > these information can be also used by automatic or semiautomatic processes > (like selection of dictionaries for translations, or selection of > translators/revisers by subject area) **** > > **** > > It could be also used by humans for translation or post-edition in case of > ambiguity or lake of context in the content, but mostly by automatic > systems.**** > > **** > > I hope this helps.**** > > Pedro**** > > **** > ------------------------------ > > *De:* Felix Sasaki [mailto:fsasaki@w3.org <fsasaki@w3.org>] > *Enviado el:* sábado, 02 de junio de 2012 14:13 > *Para:* Tadej Stajner; pedro.diez > *CC:* public-multilingualweb-lt@w3.org > *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology > (semantic features of terms as opposed to domain type ontologies)**** > > **** > > Hi Tadej, Pedro, all,**** > > **** > > this looks like a great chain of producing and consuming metadata.**** > > **** > > Apologies if this was explained during last weeks call or before, but can > you clarify a bit the following:**** > > **** > > 1) How would the actual HTML markup produced in the original text > annotation use case look like?**** > > 2) How would the markup in this use case look like?**** > > 3) Would it be produced also by an automatic text annotation tool?**** > > 4) Would 1-2 be consumed by an MT tool, or by other tools?**** > > **** > > Thanks again,**** > > **** > > Felix **** > > 2012/5/31 Tadej Stajner <tadej.stajner@ijs.si>**** > > Hi Pedro, > thanks for the excellent explanation. If I understand you correctly, a > sufficient example for this use case would be annotation of individual > words with synset URI of the appropriate wordnet? If so, then I believe > this route can be practical - I think linking to the synset is a more > practical idea than expressing semantic features of the word given the > available tools. > > Enrycher can do automatic all-word disambiguation into the english > wordnet, whereas we don't have anything specific in place for semantic > features (which I suspect also holds for other text analytics providers). > > I'm also in favor of prescribing wordnets for individual languages as > valid selector domains as you suggest in option 1). That would make > validation easier since we have a known domain. > > @All: Can we come up with a second implementation for this use case, > preferrably a consumer? > > -- Tadej**** > > > > > On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote: **** > > Dear all,**** > > **** > > Sorry for the delay. I tried to contact some people I think can contribute > to this, but they are not available these weeks. **** > > **** > > Before providing an example to consider all if it is worthwhile to > maintain €œsemantic selector€ attribute in the consolidation of > “Disambiguation” I would like to do a couple considerations:**** > > **** > > 1. Probably we will not have short term any implementation, but there > are for example few semantic networks available in web (see > http://www.globalwordnet.org/gwa/wordnet_table.html) that could be > mapped using semantic selectors. See on line for example, the famous > http://wordnetweb.princeton.edu<http://wordnetweb.princeton.edu/perl/webwn> > ).**** > 2. The W3C working group SKOS (Simple Knowledge Organization System > Reference) are maybe dealing with similar things.**** > > **** > > The €œsemántica selector€ allows further lexical (simple words or multi > words) distinctions than a €œdomain€ or an ontology like NERD. Also, the > denotation is different from the €œconcept reference€, most of all in part > of speech like verbs. **** > > **** > > Within the same domain, referring to very similar concepts, languages have > semantic differences. Depending on the semantic theory used, each tries to > captivate these differences by means of different systems (semantic > features, semantic primitives, semantic nodes (in semantic networks), other > semantic representations). An example could be the German verb €œlöschen€, > which in different contexts can take different meanings that can be try to > capture using different selectors, with the different systems.**** > > **** > > €“ löschen -> clear (some > bits) > -> delete (files) > -> cancel (programs) > -> erase (a scratchpad) > -> extinguish (a fire)**** > > **** > > Other possible translations of the verb* *“löschen” are:**** > > delete**** > > löschen, streichen, tilgen, ausstreichen, herausstreichen**** > > clear**** > > löschen, klären, klarmachen, leeren, räumen, säubern**** > > erase**** > > löschen, auslöschen, tilgen, ausradieren, radieren, abwischen**** > > extinguish**** > > löschen, auslöschen, zerstören**** > > quench**** > > löschen, stillen, abschrecken, dämpfen**** > > put out**** > > löschen, bringen, ausmachen, ausschalten, treiben, verstimmen**** > > unload**** > > entladen, abladen, ausladen, löschen, abstoŸen, abwälzen**** > > discharge**** > > entladen, erfüllen, entlassen, entlasten, löschen, ausstoŸen**** > > wipe out**** > > auslöschen, löschen, ausrotten, tilgen, zunichte machen, auswischen**** > > slake**** > > stillen, löschen**** > > close**** > > schlieŸen, verschlieŸen, abschlieŸen, sperren, zumachen, löschen**** > > blot**** > > löschen, abtupfen, klecksen, beklecksen, sich unmöglich machen, sich > verderben**** > > turn off**** > > ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen**** > > blow out**** > > auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen**** > > zap**** > > abknallen, düsen, umschalten, löschen, töten, kaputtmachen**** > > redeem**** > > einlösen, erlösen, zurückkaufen, tilgen, retten, löschen**** > > pay off**** > > auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen**** > > switch out**** > > löschen**** > > unship**** > > ausladen, entladen, abnehmen, löschen**** > > souse**** > > eintauchen, durchtränken, löschen, nass machen**** > > rub off**** > > abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen**** > > strike off**** > > löschen**** > > land**** > > landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, löschen**** > > **** > > **** > > **** > > According to this, the consolidation of disambiguation/namedEntity/ data > categories under €œTerminology€ > http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguationcould be the following. It is thought to cover operational URI or XPath > pointers to the current three most important semantic resources: conceptual > (ontology), semantic (semantic networks or lexical databases) and > terminological (glossaries and terminological resources), where ontologies > are used for both general lexicon and terminology, semantic networks to > represent general vocabulary (lexicon), and terminological resources > specialized vocabulary.**** > > **** > > *disambiguation***** > > Includes data to be used by MT systems in disambiguating difficult content > **** > > **** > > *Data model***** > > - concept reference: points to a *concept in an ontology* that this > fragment of text represents. May be an URI or an XPath pointer.**** > - semantic selector: points to a *meaning in an semantic network* that > this fragment of text represents. May be an URI or an XPath pointer.*** > * > - terminology reference: points to *a term in a terminological resource > * that this fragment of text represents. May be an URI or an XPath > pointer.**** > - equivalent translation: expressions of that concept in other > languages, for example for training MT systems**** > > **** > > **** > > Also, I would keep *textAnalysisAnnotation*, since the purpose is quite > different.**** > > **** > > Anyway, if we consider not to include €œsemantic selector€ now, maybe it > can be for future versions or to be treated in liaison with other groups.* > *** > > **** > > I hope it helps,**** > > Pedro**** > > **** > > *__________________________________***** > > * ***** > > *Pedro L. Díez Orzas***** > > *Presidente Ejecutivo/CEO***** > > *Linguaserve Internacionalización de Servicios, S.A.***** > > *Tel.: +34 91 761 64 60 > Fax: +34 91 542 89 28 ***** > > *E-mail: **pedro.diez@linguaserve.com***** > > *www.linguaserve.com***** > > * ***** > > «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley > 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y > Comercio Electrónico, le informamos que procederemos al archivo y > tratamiento de sus datos exclusivamente con fines de promoción de los > productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACI“N DE > SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y > tratamiento de los datos proporcionados, o no deseen recibir comunicaciones > comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a > clients@linguaserve.com, y su petición será inmediatamente cumplida.»**** > > **** > > "According to the provisions set forth in articles 21 and 22 of Law > 34/2002 of July 11 regarding Information Society and eCommerce Services, we > will store and use your personal data with the sole purpose of marketing > the products and services offered by LINGUASERVE INTERNACIONALIZACI“N DE > SERVICIOS, S.A. If you do not wish your personal data to be stored and > handled, or you do not wish to receive further information regarding > products and services offered by our company, please e-mail us to > clients@linguaserve.com. Your request will be processed immediately."**** > > *____________________________________***** > > **** > > **** > > **** > > > > **** > > **** > > -- > Felix Sasaki**** > > DFKI / W3C Fellow**** > > **** > > ** ** > > > > **** > > ** ** > > -- > Felix Sasaki **** > > DFKI / W3C Fellow**** > > ** ** > > ** ** > > > > **** > > ** ** > > -- > Felix Sasaki **** > > DFKI / W3C Fellow**** > > ** ** > > ** ** > -- Felix Sasaki DFKI / W3C Fellow
Received on Saturday, 9 June 2012 04:45:03 UTC