Re: [ACTION-94]: go and find examples of concept ontology (semantic features of terms as opposed to domain type ontologies) from Felix Sasaki on 2012-06-07 (public-multilingualweb-lt@w3.org from June 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 7 Jun 2012 16:19:02 +0200
To: Tadej Stajner <tadej.stajner@ijs.si>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <CAL58czp-LXNeoBAbCeR=E_=iV6iOqH28Sk_QtR7f6CudHmUFRg@mail.gmail.com>
2012/6/7 Tadej Stajner <tadej.stajner@ijs.si>

>  Hi Felix,
> as far as I'm aware, URIs only exist for the English wordnet. Maybe
> prefixing the a # was not the best stylistic choice here, but yes, what I
> meant to convey is that that value was a local identifier, valid within a
> particular semantic network.
>
> In the ideal scenario, these selectors would be dereferencible and
> verifiable via URIs for arbitrary wordnets and terminology lexicons and
> their entries.
>


OK - the main point would be that they are dereferencible and verifiable.
In practice, you will not achieve that for arbitrary wordnets, but you can
achieve that for a subset, if the related "players" agree. In the
"collation" example mentioned before, the identifier for the Unicode code
point based collation
http://www.w3.org/2005/xpath-functions/collation/codepoint/ was the lowest
common dominator; in addition to that everybody is free to have other URIs
for arbitrary collations. I would hope that we could end up with such a
list (hopefully longer than one) for the semantic networks too.

Felix



> Do we have any people involved in developing semantic networks or term
> lexicons on this list? The compromise is allowing some limited classes of
> non-URI local selectors, like synset IDs for wordnets, and term IDs for TBX
> lexicons.
>
> -- Tadej
>
>
> On 6/7/2012 3:44 PM, Felix Sasaki wrote:
>
> Thanks, Tadej.
>
>  The value of the its-selector attribute looks like a document internal
> link. But it is probably an identifier of the synset in the given semantic
> network, no?
>
>  About 1) and 2): is your made-up example then the output of the text
> annotation use case? I am asking since you say "2) markup in raw ITS", so
> I'm not sure.
>
>  Also, it seems that an implementation needs to "know" about the
> resources that are identified via its-semantic-network-ref. This is really
> an identifier, like
> http://www.w3.org/2005/xpath-functions/collation/codepoint/
> is an identifier for a Unicode code point collation; it doesn't give you
> the collation data, but creating an implementation that "understands" the
> identifier means probably caching the collation data. The same would be
> true for the semantic network.
>
>  This leads to the next question: can we engage the developers of the
> semantic network (or other disambiguation related) resources to come up
> with stable URIs for these? It would be great to list these URIs in our
> specification and say "this is how you identify the English wordnet etc.",
> for scenarios like the collation data mentioned above.
>
>  Felix
>
> 2012/6/7 Tadej Štajner <tadej.stajner@ijs.si>
>
>>  Hi,
>>
>> I agree with Pedro on the questions. Automatic word sense disambiguation
>> is in practice still not perfect, so some semi-automatic user interfaces
>> make a lot of sense. And how I think that this could look like in a made-up
>> example, answering Felix's 1) and 2):
>>
>> 1) HTML+ITS: <span its-disambiguation its-semantic-network-ref=
>> "http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>its-selector="#synset_loschen_3">löschen</span>
>>
>> 2) Markup in raw ITS
>>  <its:disambiguation
>>     semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>
>>     selector="#synset_loschen_3">löschen</its:disambiguation>
>>
>> -- Tadej
>>
>>
>>
>> On 04. 06. 2012 13:53, Pedro L. Díez Orzas wrote:
>>
>>  Dear Felix,
>>
>>
>>
>> Thank you very much. Probably Tadej can prepare the use cases you
>> mention, with the consolidated data category. About the question 3 and 4, I
>> can tell you the following:
>>
>>
>>
>> 3) Would it be produced also by an automatic text annotation tool?
>>
>>
>>
>> For the pointers to the three information referred (concepts in Ontology,
>> meanings in Lexical DB, and terms in Terminological resources) I think it
>> would be possible semiautomatic annotation tools, that is, proposed by the
>> tool and confirmed by user.
>>
>>
>>
>> The fully automatic text annotation  would need more sophisticate
>> “semantic calculus”, and most of these are under research, as far as I
>> know. Maybe, in this cases, it should be combined with textAnalysisAnnotation,
>> specifying in *Annotation agent* – and *Confidence score* – which systemand with which reliability has been produced.
>>
>>
>>
>> 4) Would 1-2 be consumed by an MT tool, or by other tools?
>>
>>
>>
>> These can be basically consumed by language processing tools, like MT,
>> and other Linguistic Technology that needs content or semantic info. For
>> instance Text Analytics, Semantic search, etc.. In the localization chains,
>> these information can be also used by automatic or semiautomatic processes
>> (like selection of dictionaries for translations, or selection of
>> translators/revisers by subject area)
>>
>>
>>
>> It could be also used by humans for translation or post-edition in case
>> of ambiguity or lake of context in the content, but mostly by automatic
>> systems.
>>
>>
>>
>> I hope this helps.
>>
>> Pedro
>>
>>
>>  ------------------------------
>>
>> *De:* Felix Sasaki [mailto:fsasaki@w3.org <fsasaki@w3.org>]
>> *Enviado el:* sábado, 02 de junio de 2012 14:13
>> *Para:* Tadej Stajner; pedro.diez
>> *CC:* public-multilingualweb-lt@w3.org
>> *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology
>> (semantic features of terms as opposed to domain type ontologies)
>>
>>
>>
>> Hi Tadej, Pedro, all,
>>
>>
>>
>> this looks like a great chain of producing and consuming metadata.
>>
>>
>>
>> Apologies if this was explained during last weeks call or before, but can
>> you clarify a bit the following:
>>
>>
>>
>> 1) How would the actual HTML markup produced in the original text
>> annotation use case look like?
>>
>> 2) How would the markup in this use case look like?
>>
>> 3) Would it be produced also by an automatic text annotation tool?
>>
>> 4) Would 1-2 be consumed by an MT tool, or by other tools?
>>
>>
>>
>> Thanks again,
>>
>>
>>
>> Felix
>>
>> 2012/5/31 Tadej Stajner <tadej.stajner@ijs.si>
>>
>> Hi Pedro,
>> thanks for the excellent explanation. If I understand you correctly, a
>> sufficient example for this use case would be annotation of individual
>> words with synset URI of the appropriate wordnet? If so, then I believe
>> this route can be practical - I think linking to the synset is a more
>> practical idea than expressing semantic features of the word given the
>> available tools.
>>
>> Enrycher can do automatic all-word disambiguation into the english
>> wordnet, whereas  we don't have anything specific in place for semantic
>> features (which I suspect also holds for other text analytics providers).
>>
>> I'm also in favor of prescribing wordnets for individual languages as
>> valid selector domains as you suggest in option 1). That would make
>> validation easier since we have a known domain.
>>
>> @All: Can we come up with a second implementation for this use case,
>> preferrably a consumer?
>>
>> -- Tadej
>>
>>
>>
>>
>> On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote:
>>
>> Dear all,
>>
>>
>>
>> Sorry for the delay. I tried to contact some people I think can
>> contribute to this, but they are not available these weeks.
>>
>>
>>
>> Before providing an example to consider all if it is worthwhile to
>> maintain “semantic selector” attribute in the consolidation of
>> “Disambiguation” I would like to do a couple considerations:
>>
>>
>>
>>    1. Probably we will not have short term any implementation, but there
>>    are for example few semantic networks available in web (see
>>    http://www.globalwordnet.org/gwa/wordnet_table.html) that could be
>>    mapped using semantic selectors. See on line for example, the famous
>>    http://wordnetweb.princeton.edu<http://wordnetweb.princeton.edu/perl/webwn>
>>    ).
>>    2. The W3C working group SKOS (Simple Knowledge Organization System
>>    Reference) are maybe dealing with similar things.
>>
>>
>>
>> The “semántica selector” allows further lexical (simple words or multi
>> words) distinctions than a “domain” or an ontology like NERD. Also, the
>> denotation is different from the “concept reference”, most of all in part
>> of speech like verbs.
>>
>>
>>
>> Within the same domain, referring to very similar concepts, languages
>> have semantic differences. Depending on the semantic theory used, each
>> tries to captivate these differences by means of different systems
>> (semantic features, semantic primitives, semantic nodes (in semantic
>> networks), other semantic representations). An example could be the German
>> verb “löschen”, which in different contexts can take different meanings
>> that can be try to capture using different selectors, with the different
>> systems.
>>
>>
>>
>> –         löschen                        -> clear             (some
>> bits)
>>                                    -> delete           (files)
>>                                    -> cancel          (programs)
>>                                    -> erase            (a scratchpad)
>>                                    -> extinguish     (a fire)
>>
>>
>>
>> Other possible translations of the verb* *“löschen” are:
>>
>> delete
>>
>> löschen, streichen, tilgen, ausstreichen, herausstreichen
>>
>> clear
>>
>> löschen, klären, klarmachen, leeren, räumen, säubern
>>
>> erase
>>
>> löschen, auslöschen, tilgen, ausradieren, radieren, abwischen
>>
>> extinguish
>>
>> löschen, auslöschen, zerstören
>>
>> quench
>>
>> löschen, stillen, abschrecken, dämpfen
>>
>> put out
>>
>> löschen, bringen, ausmachen, ausschalten, treiben, verstimmen
>>
>> unload
>>
>> entladen, abladen, ausladen, löschen, abstoßen, abwälzen
>>
>> discharge
>>
>> entladen, erfüllen, entlassen, entlasten, löschen, ausstoßen
>>
>> wipe out
>>
>> auslöschen, löschen, ausrotten, tilgen, zunichte machen, auswischen
>>
>> slake
>>
>> stillen, löschen
>>
>> close
>>
>> schließen, verschließen, abschließen, sperren, zumachen, löschen
>>
>> blot
>>
>> löschen, abtupfen, klecksen, beklecksen, sich unmöglich machen, sich
>> verderben
>>
>> turn off
>>
>> ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen
>>
>> blow out
>>
>> auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen
>>
>> zap
>>
>> abknallen, düsen, umschalten, löschen, töten, kaputtmachen
>>
>> redeem
>>
>> einlösen, erlösen, zurückkaufen, tilgen, retten, löschen
>>
>> pay off
>>
>> auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen
>>
>> switch out
>>
>> löschen
>>
>> unship
>>
>> ausladen, entladen, abnehmen, löschen
>>
>> souse
>>
>> eintauchen, durchtränken, löschen, nass machen
>>
>> rub off
>>
>> abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen
>>
>> strike off
>>
>> löschen
>>
>> land
>>
>> landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, löschen
>>
>>
>>
>>
>>
>>
>>
>> According to this, the consolidation of disambiguation/namedEntity/  data
>> categories under “Terminology”
>> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguationcould be the following. It is thought to cover operational URI or XPath
>> pointers to the current three most important semantic resources: conceptual
>> (ontology), semantic (semantic networks or lexical databases) and
>> terminological (glossaries and terminological resources), where ontologies
>> are used for both general lexicon and terminology, semantic networks to
>> represent general vocabulary (lexicon), and terminological resources
>> specialized vocabulary.
>>
>>
>>
>> *disambiguation*
>>
>> Includes data to be used by MT systems in disambiguating difficult content
>>
>>
>>
>> *Data model*
>>
>>    - concept reference: points to a *concept in an ontology* that this
>>    fragment of text represents. May be an URI or an XPath pointer.
>>    - semantic selector: points to a *meaning in an semantic network*that this fragment of text represents. May be an URI or an XPath pointer.
>>    - terminology reference: points to *a term in a terminological
>>    resource* that this fragment of text represents. May be an URI or an
>>    XPath pointer.
>>    - equivalent translation: expressions of that concept in other
>>    languages, for example for training MT systems
>>
>>
>>
>>
>>
>> Also, I would keep *textAnalysisAnnotation*, since the purpose is quite
>> different.
>>
>>
>>
>> Anyway, if we consider not to include “semantic selector” now, maybe it
>> can be for future versions or to be treated in liaison with other groups.
>>
>>
>>
>> I hope it helps,
>>
>> Pedro
>>
>>
>>
>> *__________________________________*
>>
>> * *
>>
>> *Pedro L. Díez Orzas*
>>
>> *Presidente Ejecutivo/CEO*
>>
>> *Linguaserve Internacionalización de Servicios, S.A.*
>>
>> *Tel.: +34 91 761 64 60 <%2B34%2091%20761%2064%2060>
>> Fax: +34 91 542 89 28 <%2B34%2091%20542%2089%2028> *
>>
>> *E-mail: **pedro.diez@linguaserve.com*
>>
>> *www.linguaserve.com*
>>
>> * *
>>
>> «En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley
>> 34/2002, de 11 de julio, de Servicios de la Sociedad de Información y
>> Comercio Electrónico, le informamos que procederemos al archivo y
>> tratamiento de sus datos exclusivamente con fines de promoción de los
>> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE
>> SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y
>> tratamiento de los datos proporcionados, o no deseen recibir comunicaciones
>> comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a
>> clients@linguaserve.com, y su petición será inmediatamente cumplida.»
>>
>>
>>
>> "According to the provisions set forth in articles 21 and 22 of Law
>> 34/2002 of July 11 regarding Information Society and eCommerce Services, we
>> will store and use your personal data with the sole purpose of marketing
>> the products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE
>> SERVICIOS, S.A. If you do not wish your personal data to be stored and
>> handled, or you do not wish to receive further information regarding
>> products and services offered by our company, please e-mail us to
>> clients@linguaserve.com. Your request will be processed immediately."
>>
>>  *____________________________________*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Felix Sasaki
>>
>> DFKI / W3C Fellow
>>
>>
>>
>>
>>
>
>
>  --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Thursday, 7 June 2012 14:19:34 UTC