RE: [ACTION-94]: go and find examples of concept ontology (semantic features of terms as opposed to domain type ontologies) from Pedro L. Díez Orzas on 2012-06-08 (public-multilingualweb-lt@w3.org from June 2012)

From: Pedro L. Díez Orzas <pedro.diez@linguaserve.com>
Date: Fri, 8 Jun 2012 11:47:15 +0200
To: "'Dave Lewis'" <dave.lewis@cs.tcd.ie>, <public-multilingualweb-lt@w3.org>
Message-ID: <DB9A222D4EF348468EBDA55D243BAB04@newlas.local>
Dear Tadej, Felix, Yves, Dave, all, 

 

I checked with some expert people and told me the following:

 

It would be great if links to wordnet can be included in the annotations.
The best thing to do would be to use the open linked data versions of
wordnet:

 

 <http://thedatahub.org/dataset/vu-wordnet>
http://thedatahub.org/dataset/vu-wordnet

 

It has URIs for synsets (actually sense meanings but I convinced them they
need to shift to synset IDs, which they will do in the near future). English
synsets are good for any language since the other languages link to English
(still as an Inter Lingual Index). Eventually, other wordnets will also be
published as linked open data.

 

Another thing is domain tags. WordnetDomain tags are used here (Dewey
system). Since it is linked to English Wordnet it is linked to any synset in
any language linked to English. That will be a very useful semantic tag also
for translation.

 

I think this is a right way to reinforce the connection between MLS-LT and
open linked data. I hope it helps.

 

Best,

Pedro

 

  _____  

De: Dave Lewis [mailto:dave.lewis@cs.tcd.ie] 
Enviado el: jueves, 07 de junio de 2012 23:58
Para: public-multilingualweb-lt@w3.org
Asunto: Re: [ACTION-94]: go and find examples of concept ontology (semantic
features of terms as opposed to domain type ontologies)

 

Hi Tadej,
I spoke to some people from ISOCAT at LREC. They operate persistent URL for
their platform, so with an example perhaps we could add that to the list?

cheers,
Dave
 
On 07/06/2012 15:19, Felix Sasaki wrote: 

 

2012/6/7 Tadej Stajner <tadej.stajner@ijs.si>

Hi Felix,
as far as I'm aware, URIs only exist for the English wordnet. Maybe
prefixing the a # was not the best stylistic choice here, but yes, what I
meant to convey is that that value was a local identifier, valid within a
particular semantic network. 

In the ideal scenario, these selectors would be dereferencible and
verifiable via URIs for arbitrary wordnets and terminology lexicons and
their entries. 

 

 

OK - the main point would be that they are dereferencible and verifiable. In
practice, you will not achieve that for arbitrary wordnets, but you can
achieve that for a subset, if the related "players" agree. In the
"collation" example mentioned before, the identifier for the Unicode code
point based collation
http://www.w3.org/2005/xpath-functions/collation/codepoint/ was the lowest
common dominator; in addition to that everybody is free to have other URIs
for arbitrary collations. I would hope that we could end up with such a list
(hopefully longer than one) for the semantic networks too.

 

Felix

 

 

Do we have any people involved in developing semantic networks or term
lexicons on this list? The compromise is allowing some limited classes of
non-URI local selectors, like synset IDs for wordnets, and term IDs for TBX
lexicons. 

-- Tadej 



On 6/7/2012 3:44 PM, Felix Sasaki wrote: 

Thanks, Tadej. 

 

The value of the its-selector attribute looks like a document internal link.
But it is probably an identifier of the synset in the given semantic
network, no?

 

About 1) and 2): is your made-up example then the output of the text
annotation use case? I am asking since you say "2) markup in raw ITS", so
I'm not sure.

 

Also, it seems that an implementation needs to "know" about the resources
that are identified via its-semantic-network-ref. This is really an
identifier, like 

http://www.w3.org/2005/xpath-functions/collation/codepoint/

is an identifier for a Unicode code point collation; it doesn't give you the
collation data, but creating an implementation that "understands" the
identifier means probably caching the collation data. The same would be true
for the semantic network.

 

This leads to the next question: can we engage the developers of the
semantic network (or other disambiguation related) resources to come up with
stable URIs for these? It would be great to list these URIs in our
specification and say "this is how you identify the English wordnet etc.",
for scenarios like the collation data mentioned above.

 

Felix 

2012/6/7 Tadej Štajner <tadej.stajner@ijs.si>

Hi, 

I agree with Pedro on the questions. Automatic word sense disambiguation is
in practice still not perfect, so some semi-automatic user interfaces make a
lot of sense. And how I think that this could look like in a made-up
example, answering Felix's 1) and 2):

1) HTML+ITS: <span its-disambiguation its-semantic-network-ref=
<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>
"http://www.sfs.uni-tuebingen.de/lsd/index.shtml"
its-selector="#synset_loschen_3">löschen</span>

2) Markup in raw ITS
 <its:disambiguation 
    semanticNetworkRef= <http://www.sfs.uni-tuebingen.de/lsd/index.shtml>
"http://www.sfs.uni-tuebingen.de/lsd/index.shtml"
    selector="#synset_loschen_3">löschen</its:disambiguation>

-- Tadej 




On 04. 06. 2012 13 <tel:04.%2006.%202012%2013> :53, Pedro L. Díez Orzas
wrote: 

Dear Felix,

 

Thank you very much. Probably Tadej can prepare the use cases you mention,
with the consolidated data category. About the question 3 and 4, I can tell
you the following:

 

3) Would it be produced also by an automatic text annotation tool?

 

For the pointers to the three information referred (concepts in Ontology,
meanings in Lexical DB, and terms in Terminological resources) I think it
would be possible semiautomatic annotation tools, that is, proposed by the
tool and confirmed by user.

 

The fully automatic text annotation  would need more sophisticate "semantic
calculus", and most of these are under research, as far as I know. Maybe, in
this cases, it should be combined with textAnalysisAnnotation, specifying in
Annotation agent - and Confidence score - which system and with which
reliability has been produced.

 

4) Would 1-2 be consumed by an MT tool, or by other tools?

 

These can be basically consumed by language processing tools, like MT, and
other Linguistic Technology that needs content or semantic info. For
instance Text Analytics, Semantic search, etc.. In the localization chains,
these information can be also used by automatic or semiautomatic processes
(like selection of dictionaries for translations, or selection of
translators/revisers by subject area) 

 

It could be also used by humans for translation or post-edition in case of
ambiguity or lake of context in the content, but mostly by automatic
systems.

 

I hope this helps.

Pedro

 


  _____  


De: Felix Sasaki [mailto:fsasaki@w3.org] 
Enviado el: sábado, 02 de junio de 2012 14:13
Para: Tadej Stajner; pedro.diez
CC: public-multilingualweb-lt@w3.org
Asunto: Re: [ACTION-94]: go and find examples of concept ontology (semantic
features of terms as opposed to domain type ontologies)

 

Hi Tadej, Pedro, all,

 

this looks like a great chain of producing and consuming metadata.

 

Apologies if this was explained during last weeks call or before, but can
you clarify a bit the following:

 

1) How would the actual HTML markup produced in the original text annotation
use case look like?

2) How would the markup in this use case look like?

3) Would it be produced also by an automatic text annotation tool?

4) Would 1-2 be consumed by an MT tool, or by other tools?

 

Thanks again,

 

Felix 

2012/5/31 Tadej Stajner <tadej.stajner@ijs.si>

Hi Pedro, 
thanks for the excellent explanation. If I understand you correctly, a
sufficient example for this use case would be annotation of individual words
with synset URI of the appropriate wordnet? If so, then I believe this route
can be practical - I think linking to the synset is a more practical idea
than expressing semantic features of the word given the available tools. 

Enrycher can do automatic all-word disambiguation into the english wordnet,
whereas  we don't have anything specific in place for semantic features
(which I suspect also holds for other text analytics providers).

I'm also in favor of prescribing wordnets for individual languages as valid
selector domains as you suggest in option 1). That would make validation
easier since we have a known domain. 

@All: Can we come up with a second implementation for this use case,
preferrably a consumer? 

-- Tadej




On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote: 

Dear all,

 

Sorry for the delay. I tried to contact some people I think can contribute
to this, but they are not available these weeks. 

 

Before providing an example to consider all if it is worthwhile to maintain
"semantic selector" attribute in the consolidation of "Disambiguation" I
would like to do a couple considerations:

 

1.	Probably we will not have short term any implementation, but there
are for example few semantic networks available in web (see
http://www.globalwordnet.org/gwa/wordnet_table.html) that could be mapped
using semantic selectors. See on line for example, the famous
http://wordnetweb.princeton.edu <http://wordnetweb.princeton.edu/perl/webwn>
).
2.	The W3C working group SKOS (Simple Knowledge Organization System
Reference) are maybe dealing with similar things.

 

The "semántica selector" allows further lexical (simple words or multi
words) distinctions than a "domain" or an ontology like NERD. Also, the
denotation is different from the "concept reference", most of all in part of
speech like verbs.  

 

Within the same domain, referring to very similar concepts, languages have
semantic differences. Depending on the semantic theory used, each tries to
captivate these differences by means of different systems (semantic
features, semantic primitives, semantic nodes (in semantic networks), other
semantic representations). An example could be the German verb "löschen",
which in different contexts can take different meanings that can be try to
capture using different selectors, with the different systems.

 

-         löschen                        -> clear             (some bits)

                                   -> delete           (files)
                                   -> cancel          (programs)
                                   -> erase            (a scratchpad)
                                   -> extinguish     (a fire)

 


Other possible translations of the verb "löschen" are:


delete

löschen, streichen, tilgen, ausstreichen, herausstreichen


clear

löschen, klären, klarmachen, leeren, räumen, säubern


erase

löschen, auslöschen, tilgen, ausradieren, radieren, abwischen


extinguish

löschen, auslöschen, zerstören


quench

löschen, stillen, abschrecken, dämpfen


put out

löschen, bringen, ausmachen, ausschalten, treiben, verstimmen


unload

entladen, abladen, ausladen, löschen, abstoßen, abwälzen


discharge

entladen, erfüllen, entlassen, entlasten, löschen, ausstoßen


wipe out

auslöschen, löschen, ausrotten, tilgen, zunichte machen, auswischen


slake

stillen, löschen


close

schließen, verschließen, abschließen, sperren, zumachen, löschen


blot

löschen, abtupfen, klecksen, beklecksen, sich unmöglich machen, sich
verderben


turn off

ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen


blow out

auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen


zap

abknallen, düsen, umschalten, löschen, töten, kaputtmachen


redeem

einlösen, erlösen, zurückkaufen, tilgen, retten, löschen


pay off

auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen


switch out

löschen


unship

ausladen, entladen, abnehmen, löschen


souse

eintauchen, durchtränken, löschen, nass machen


rub off

abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen


strike off

löschen


land

landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, löschen

 

 

 

According to this, the consolidation of disambiguation/namedEntity/  data
categories under "Terminology"
http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambi
guation could be the following. It is thought to cover operational URI or
XPath pointers to the current three most important semantic resources:
conceptual (ontology), semantic (semantic networks or lexical databases) and
terminological (glossaries and terminological resources), where ontologies
are used for both general lexicon and terminology, semantic networks to
represent general vocabulary (lexicon), and terminological resources
specialized vocabulary.

 

disambiguation

Includes data to be used by MT systems in disambiguating difficult content

 

Data model

*	concept reference: points to a concept in an ontology that this
fragment of text represents. May be an URI or an XPath pointer.
*	semantic selector: points to a meaning in an semantic network that
this fragment of text represents. May be an URI or an XPath pointer.
*	terminology reference: points to a term in a terminological resource
that this fragment of text represents. May be an URI or an XPath pointer.
*	equivalent translation: expressions of that concept in other
languages, for example for training MT systems

 

 

Also, I would keep textAnalysisAnnotation, since the purpose is quite
different.

 

Anyway, if we consider not to include "semantic selector" now, maybe it can
be for future versions or to be treated in liaison with other groups.

 

I hope it helps,

Pedro

 

__________________________________

 

Pedro L. Díez Orzas

Presidente Ejecutivo/CEO

Linguaserve Internacionalización de Servicios, S.A.

Tel.: +34 91 761 64 60 <tel:%2B34%2091%20761%2064%2060> 
Fax: +34 91 542 89 28 <tel:%2B34%2091%20542%2089%2028>  

E-mail:  <mailto:pedro.diez@linguaserve.com> pedro.diez@linguaserve.com

www.linguaserve.com <http://www.linguaserve.com/> 

 

<En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley
34/2002, de 11 de julio, de Servicios de la Sociedad de Información y
Comercio Electrónico, le informamos que procederemos al archivo y
tratamiento de sus datos exclusivamente con fines de promoción de los
productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE
SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y
tratamiento de los datos proporcionados, o no deseen recibir comunicaciones
comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a
clients@linguaserve.com, y su petición será inmediatamente cumplida.>

 

"According to the provisions set forth in articles 21 and 22 of Law 34/2002
of July 11 regarding Information Society and eCommerce Services, we will
store and use your personal data with the sole purpose of marketing the
products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE
SERVICIOS, S.A. If you do not wish your personal data to be stored and
handled, or you do not wish to receive further information regarding
products and services offered by our company, please e-mail us to
clients@linguaserve.com. Your request will be processed immediately."

 ____________________________________

 

 

 





 

-- 
Felix Sasaki

DFKI / W3C Fellow

 

 





 

-- 
Felix Sasaki 

DFKI / W3C Fellow

 

 





 

-- 
Felix Sasaki 

DFKI / W3C Fellow
Received on Friday, 8 June 2012 09:48:14 UTC