Re: [ACTION-94]: go and find examples of concept ontology (semantic features of terms as opposed to domain type ontologies) from Felix Sasaki on 2012-06-07 (public-multilingualweb-lt@w3.org from June 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 7 Jun 2012 15:44:36 +0200
To: Tadej Štajner <tadej.stajner@ijs.si>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <CAL58czpmGjG5ugxNvxGmj=X1uObCXXavW5TuNv20321Ncz5KJw@mail.gmail.com>
Thanks, Tadej.

The value of the its-selector attribute looks like a document internal
link. But it is probably an identifier of the synset in the given semantic
network, no?

About 1) and 2): is your made-up example then the output of the text
annotation use case? I am asking since you say "2) markup in raw ITS", so
I'm not sure.

Also, it seems that an implementation needs to "know" about the resources
that are identified via its-semantic-network-ref. This is really an
identifier, like
http://www.w3.org/2005/xpath-functions/collation/codepoint/
is an identifier for a Unicode code point collation; it doesn't give you
the collation data, but creating an implementation that "understands" the
identifier means probably caching the collation data. The same would be
true for the semantic network.

This leads to the next question: can we engage the developers of the
semantic network (or other disambiguation related) resources to come up
with stable URIs for these? It would be great to list these URIs in our
specification and say "this is how you identify the English wordnet etc.",
for scenarios like the collation data mentioned above.

Felix

2012/6/7 Tadej Ĺ tajner <tadej.stajner@ijs.si>

>  Hi,
>
> I agree with Pedro on the questions. Automatic word sense disambiguation
> is in practice still not perfect, so some semi-automatic user interfaces
> make a lot of sense. And how I think that this could look like in a made-up
> example, answering Felix's 1) and 2):
>
> 1) HTML+ITS: <span its-disambiguation its-semantic-network-ref=
> "http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>its-selector="#synset_loschen_3">lĂśschen</span>
>
> 2) Markup in raw ITS
>  <its:disambiguation
>     semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>
>     selector="#synset_loschen_3">lĂśschen</its:disambiguation>
>
> -- Tadej
>
>
>
> On 04. 06. 2012 13:53, Pedro L. DĂez Orzas wrote:
>
> **
>
> Dear Felix,****
>
> ** **
>
> Thank you very much. Probably Tadej can prepare the use cases you mention,
> with the consolidated data category. About the question 3 and 4, I can tell
> you the following:****
>
> ** **
>
> 3) Would it be produced also by an automatic text annotation tool?****
>
> ** **
>
> For the pointers to the three information referred (concepts in Ontology,
> meanings in Lexical DB, and terms in Terminological resources) I think it
> would be possible semiautomatic annotation tools, that is, proposed by the
> tool and confirmed by user.****
>
> ** **
>
> The fully automatic text annotation  would need more sophisticate
> âsemantic calculusâ, and most of these are under research, as far as I
> know. Maybe, in this cases, it should be combined with textAnalysisAnnotation,
> specifying in *Annotation agent* â and *Confidence score* â which systemand with which reliability has been produced.
> ****
>
> ** **
>
> 4) Would 1-2 be consumed by an MT tool, or by other tools?****
>
> ** **
>
> These can be basically consumed by language processing tools, like MT, and
> other Linguistic Technology that needs content or semantic info. For
> instance Text Analytics, Semantic search, etc.. In the localization chains,
> these information can be also used by automatic or semiautomatic processes
> (like selection of dictionaries for translations, or selection of
> translators/revisers by subject area) ****
>
> ** **
>
> It could be also used by humans for translation or post-edition in case of
> ambiguity or lake of context in the content, but mostly by automatic
> systems.****
>
> ** **
>
> I hope this helps.****
>
> Pedro****
>
> ** **
>  ------------------------------
>
> *De:* Felix Sasaki [mailto:fsasaki@w3.org <fsasaki@w3.org>]
> *Enviado el:* sĂĄbado, 02 de junio de 2012 14:13
> *Para:* Tadej Stajner; pedro.diez
> *CC:* **public-multilingualweb-lt@w3.org**
> *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology
> (semantic features of terms as opposed to domain type ontologies)****
>
> ** **
>
> Hi Tadej, Pedro, all,****
>
> ** **
>
> this looks like a great chain of producing and consuming metadata.****
>
> ** **
>
> Apologies if this was explained during last weeks call or before, but can
> you clarify a bit the following:****
>
> ** **
>
> 1) How would the actual HTML markup produced in the original text
> annotation use case look like?****
>
> 2) How would the markup in this use case look like?****
>
> 3) Would it be produced also by an automatic text annotation tool?****
>
> 4) Would 1-2 be consumed by an MT tool, or by other tools?****
>
> ** **
>
> Thanks again,****
>
> ** **
>
> Felix ****
>
> 2012/5/31 Tadej Stajner <tadej.stajner@ijs.si>****
>
> Hi Pedro,
> thanks for the excellent explanation. If I understand you correctly, a
> sufficient example for this use case would be annotation of individual
> words with synset URI of the appropriate wordnet? If so, then I believe
> this route can be practical - I think linking to the synset is a more
> practical idea than expressing semantic features of the word given the
> available tools.
>
> Enrycher can do automatic all-word disambiguation into the english
> wordnet, whereas  we don't have anything specific in place for semantic
> features (which I suspect also holds for other text analytics providers).
>
> I'm also in favor of prescribing wordnets for individual languages as
> valid selector domains as you suggest in option 1). That would make
> validation easier since we have a known domain.
>
> @All: Can we come up with a second implementation for this use case,
> preferrably a consumer?
>
> -- Tadej****
>
>
>
>
> On 5/29/2012 2:00 PM, Pedro L. DĂez Orzas wrote: ****
>
> Dear all,****
>
>  ****
>
> Sorry for the delay. I tried to contact some people I think can contribute
> to this, but they are not available these weeks. ****
>
>  ****
>
> Before providing an example to consider all if it is worthwhile to
> maintain âsemantic selectorâ attribute in the consolidation of
> âDisambiguationâ I would like to do a couple considerations:****
>
>  ****
>
>    1. Probably we will not have short term any implementation, but there
>    are for example few semantic networks available in web (see
>    http://www.globalwordnet.org/gwa/wordnet_table.html) that could be
>    mapped using semantic selectors. See on line for example, the famous
>    http://wordnetweb.princeton.edu<http://wordnetweb.princeton.edu/perl/webwn>
>    ).****
>    2. The W3C working group SKOS (Simple Knowledge Organization System
>    Reference) are maybe dealing with similar things.****
>
>  ****
>
> The âsemĂĄntica selectorâ allows further lexical (simple words or multi
> words) distinctions than a âdomainâ or an ontology like NERD. Also, the
> denotation is different from the âconcept referenceâ, most of all in part
> of speech like verbs.  ****
>
>  ****
>
> Within the same domain, referring to very similar concepts, languages have
> semantic differences. Depending on the semantic theory used, each tries to
> captivate these differences by means of different systems (semantic
> features, semantic primitives, semantic nodes (in semantic networks), other
> semantic representations). An example could be the German verb âlĂśschenâ,
> which in different contexts can take different meanings that can be try to
> capture using different selectors, with the different systems.****
>
>  ****
>
> â         lĂśschen                        -> clear             (some
> bits)
>                                    -> delete           (files)
>                                    -> cancel          (programs)
>                                    -> erase            (a scratchpad)
>                                    -> extinguish     (a fire)****
>
>  ****
>
> Other possible translations of the verb* *âlĂśschenâ are:****
>
> delete****
>
> lĂśschen, streichen, tilgen, ausstreichen, herausstreichen****
>
> clear****
>
> lĂśschen, klĂ¤ren, klarmachen, leeren, rĂ¤umen, sĂ¤ubern****
>
> erase****
>
> lĂśschen, auslĂśschen, tilgen, ausradieren, radieren, abwischen****
>
> extinguish****
>
> lĂśschen, auslĂśschen, zerstĂśren****
>
> quench****
>
> lĂśschen, stillen, abschrecken, dĂ¤mpfen****
>
> put out****
>
> lĂśschen, bringen, ausmachen, ausschalten, treiben, verstimmen****
>
> unload****
>
> entladen, abladen, ausladen, lĂśschen, abstoĂen, abwĂ¤lzen****
>
> discharge****
>
> entladen, erfĂźllen, entlassen, entlasten, lĂśschen, ausstoĂen****
>
> wipe out****
>
> auslĂśschen, lĂśschen, ausrotten, tilgen, zunichte machen, auswischen****
>
> slake****
>
> stillen, lĂśschen****
>
> close****
>
> schlieĂen, verschlieĂen, abschlieĂen, sperren, zumachen, lĂśschen****
>
> blot****
>
> lĂśschen, abtupfen, klecksen, beklecksen, sich unmĂśglich machen, sich
> verderben****
>
> turn off****
>
> ausschalten, abbiegen, abstellen, abdrehen, einbiegen, lĂśschen****
>
> blow out****
>
> auspusten, lĂśschen, aufblasen, aufblĂ¤hen, aufbauschen, platzen****
>
> zap****
>
> abknallen, dĂźsen, umschalten, lĂśschen, tĂśten, kaputtmachen****
>
> redeem****
>
> einlĂśsen, erlĂśsen, zurĂźckkaufen, tilgen, retten, lĂśschen****
>
> pay off****
>
> auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, lĂśschen****
>
> switch out****
>
> lĂśschen****
>
> unship****
>
> ausladen, entladen, abnehmen, lĂśschen****
>
> souse****
>
> eintauchen, durchtrĂ¤nken, lĂśschen, nass machen****
>
> rub off****
>
> abreiben, abgehen, abwetzen, ausradieren, abscheuern, lĂśschen****
>
> strike off****
>
> lĂśschen****
>
> land****
>
> landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, lĂśschen****
>
>  ****
>
>  ****
>
>  ****
>
> According to this, the consolidation of disambiguation/namedEntity/  data
> categories under âTerminologyâ
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguationcould be the following. It is thought to cover operational URI or XPath
> pointers to the current three most important semantic resources: conceptual
> (ontology), semantic (semantic networks or lexical databases) and
> terminological (glossaries and terminological resources), where ontologies
> are used for both general lexicon and terminology, semantic networks to
> represent general vocabulary (lexicon), and terminological resources
> specialized vocabulary.****
>
>  ****
>
> *disambiguation*****
>
> Includes data to be used by MT systems in disambiguating difficult content
> ****
>
>  ****
>
> *Data model*****
>
>    - concept reference: points to a *concept in an ontology* that this
>    fragment of text represents. May be an URI or an XPath pointer.****
>    - semantic selector: points to a *meaning in an semantic network* that
>    this fragment of text represents. May be an URI or an XPath pointer.***
>    *
>    - terminology reference: points to *a term in a terminological resource
>    * that this fragment of text represents. May be an URI or an XPath
>    pointer.****
>    - equivalent translation: expressions of that concept in other
>    languages, for example for training MT systems****
>
>  ****
>
>  ****
>
> Also, I would keep *textAnalysisAnnotation*, since the purpose is quite
> different.****
>
>  ****
>
> Anyway, if we consider not to include âsemantic selectorâ now, maybe it
> can be for future versions or to be treated in liaison with other groups.*
> ***
>
>  ****
>
> I hope it helps,****
>
> Pedro****
>
>  ****
>
> *__________________________________*****
>
> * *****
>
> *Pedro L. DĂez Orzas*****
>
> *Presidente Ejecutivo/CEO*****
>
> *Linguaserve InternacionalizaciĂłn de Servicios, S.A.*****
>
> *Tel.: +34 91 761 64 60 <%2B34%2091%20761%2064%2060>
> Fax: +34 91 542 89 28 <%2B34%2091%20542%2089%2028> *****
>
> *E-mail: **pedro.diez@linguaserve.com*****
>
> *www.linguaserve.com*****
>
> * *****
>
> ÂŤEn cumplimiento con lo previsto con los artĂculos 21 y 22 de la Ley
> 34/2002, de 11 de julio, de Servicios de la Sociedad de InformaciĂłn y
> Comercio ElectrĂłnico, le informamos que procederemos al archivo y
> tratamiento de sus datos exclusivamente con fines de promociĂłn de los
> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIĂN DE
> SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y
> tratamiento de los datos proporcionados, o no deseen recibir comunicaciones
> comerciales sobre los productos y servicios ofrecidos, comunĂquenoslo a
> clients@linguaserve.com, y su peticiĂłn serĂĄ inmediatamente cumplida.Âť****
>
>  ****
>
> "According to the provisions set forth in articles 21 and 22 of Law
> 34/2002 of July 11 regarding Information Society and eCommerce Services, we
> will store and use your personal data with the sole purpose of marketing
> the products and services offered by LINGUASERVE INTERNACIONALIZACIĂN DE
> SERVICIOS, S.A. If you do not wish your personal data to be stored and
> handled, or you do not wish to receive further information regarding
> products and services offered by our company, please e-mail us to
> clients@linguaserve.com. Your request will be processed immediately."****
>
>  *____________________________________*****
>
>  ****
>
>  ****
>
> ** **
>
>
>
> ****
>
> ** **
>
> --
> Felix Sasaki****
>
> DFKI / W3C Fellow****
>
> ** **
>  **
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Thursday, 7 June 2012 13:45:14 UTC