Re: [ACTION-94]: go and find examples of concept ontology (semantic features of terms as opposed to domain type ontologies) from Felix Sasaki on 2012-06-09 (public-multilingualweb-lt@w3.org from June 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Sat, 9 Jun 2012 06:44:34 +0200
To: Pedro L. Díez Orzas <pedro.diez@linguaserve.com>
Cc: Dave Lewis <dave.lewis@cs.tcd.ie>, public-multilingualweb-lt@w3.org
Message-ID: <CAL58czqVZqV1B=zHOFrc50A_mYTCtwr51-h=o2m36=AytkeC_Q@mail.gmail.com>
Dear Pedro,

thank you for this - for comments see my mail to Dave about this at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jun/0030.html
but a further comment below.

I think we will have trouble to agree on types of semantic resources, e.g.
the values of
onto-concept | sem-net-node | terminology-entry | eqiv-translation
e.g. if the resources is a terminology entry, what format is it in (TBX,
OLIF, ...). If it is an onto-concept, what ontologica model is behind it?

Luckily,

http://thedatahub.org/dataset/vu-wordnet

contains the information we need. So having something like this


<span its-entity entityref="http://www.w3.org/2012/semantic-resources/"

its-selector="enwg-synset_loschen_3">lÃ¶schen</span>


 with a link from


http://www.w3.org/2012/semantic-resources/


to the CKAN page


http://thedatahub.org/dataset/vu-wordnet


would provide the same information.


What do you think?


Felix



2012/6/8 Pedro L. DÃez Orzas <pedro.diez@linguaserve.com>

> **
>
> Dear Tadej, Felix, Yves, Dave, all, ****
>
> ** **
>
> I checked with some expert people and told me the following:****
>
> ** **
>
> *It would be great if links to wordnet can be included in the
> annotations. The best thing to do would be to use the open linked data
> versions of wordnet:*
>
> * *
>
> *http://thedatahub.org/dataset/vu-wordnet***
>
> * *
>
> *It has URIs for synsets (actually sense meanings but I convinced them
> they need to shift to synset IDs, which they will do in the near future).
> English synsets are good for any language since the other languages link to
> English (still as an Inter Lingual Index). Eventually, other wordnets will
> also be published as linked open data.*
>
> * *
>
> *Another thing is domain tags. WordnetDomain tags are used here (Dewey
> system). Since it is linked to English Wordnet it is linked to any synset
> in any language linked to English. That will be a very useful semantic tag
> also for translation.*
>
> ** **
>
> I think this is a right way to reinforce the connection between MLS-LT and
> open linked data. I hope it helps.****
>
> ** **
>
> Best,****
>
> Pedro****
>
> ** **
>  ------------------------------
>
> *De:* Dave Lewis [mailto:dave.lewis@cs.tcd.ie]
> *Enviado el:* jueves, 07 de junio de 2012 23:58
> *Para:* **public-multilingualweb-lt@w3.org
> **
> *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology
> (semantic features of terms as opposed to domain type ontologies)
> ****
>
>  ** **
>
> Hi Tadej,
> I spoke to some people from ISOCAT at LREC. They operate persistent URL
> for their platform, so with an example perhaps we could add that to the
> list?
>
> cheers,
> Dave
>
> On 07/06/2012 15:19, Felix Sasaki wrote: ****
>
> ** **
>
> 2012/6/7 Tadej Stajner <tadej.stajner@ijs.si>****
>
> Hi Felix,
> as far as I'm aware, URIs only exist for the English wordnet. Maybe
> prefixing the a # was not the best stylistic choice here, but yes, what I
> meant to convey is that that value was a local identifier, valid within a
> particular semantic network.
>
> In the ideal scenario, these selectors would be dereferencible and
> verifiable via URIs for arbitrary wordnets and terminology lexicons and
> their entries. ****
>
> ** **
>
> ** **
>
> OK - the main point would be that they are dereferencible and verifiable.
> In practice, you will not achieve that for arbitrary wordnets, but you can
> achieve that for a subset, if the related "players" agree. In the
> "collation" example mentioned before, the identifier for the Unicode code
> point based collation
> http://www.w3.org/2005/xpath-functions/collation/codepoint/ was the
> lowest common dominator; in addition to that everybody is free to have
> other URIs for arbitrary collations. I would hope that we could end up with
> such a list (hopefully longer than one) for the semantic networks too.****
>
> ** **
>
> Felix****
>
> ** **
>
>  ****
>
>  Do we have any people involved in developing semantic networks or term
> lexicons on this list? The compromise is allowing some limited classes of
> non-URI local selectors, like synset IDs for wordnets, and term IDs for TBX
> lexicons.
>
> -- Tadej ****
>
>
>
> On 6/7/2012 3:44 PM, Felix Sasaki wrote: ****
>
> Thanks, Tadej. ****
>
> ** **
>
> The value of the its-selector attribute looks like a document internal
> link. But it is probably an identifier of the synset in the given semantic
> network, no?****
>
> ** **
>
> About 1) and 2): is your made-up example then the output of the text
> annotation use case? I am asking since you say "2) markup in raw ITS", so
> I'm not sure.****
>
> ** **
>
> Also, it seems that an implementation needs to "know" about the resources
> that are identified via its-semantic-network-ref. This is really an
> identifier, like ****
>
> http://www.w3.org/2005/xpath-functions/collation/codepoint/****
>
> is an identifier for a Unicode code point collation; it doesn't give you
> the collation data, but creating an implementation that "understands" the
> identifier means probably caching the collation data. The same would be
> true for the semantic network.****
>
> ** **
>
> This leads to the next question: can we engage the developers of the
> semantic network (or other disambiguation related) resources to come up
> with stable URIs for these? It would be great to list these URIs in our
> specification and say "this is how you identify the English wordnet etc.",
> for scenarios like the collation data mentioned above.****
>
> ** **
>
> Felix ****
>
> 2012/6/7 Tadej Å tajner <tadej.stajner@ijs.si>****
>
> Hi,
>
> I agree with Pedro on the questions. Automatic word sense disambiguation
> is in practice still not perfect, so some semi-automatic user interfaces
> make a lot of sense. And how I think that this could look like in a made-up
> example, answering Felix's 1) and 2):
>
> 1) HTML+ITS: <span its-disambiguation its-semantic-network-ref=
> "http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>its-selector="#synset_loschen_3">lÃ¶schen</span>
>
> 2) Markup in raw ITS
>  <its:disambiguation
>     semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"<http://www.sfs.uni-tuebingen.de/lsd/index.shtml>
>     selector="#synset_loschen_3">lÃ¶schen</its:disambiguation>
>
> -- Tadej ****
>
>
>
>
> On 04. 06. 2012 13:53, Pedro L. DÃez Orzas wrote: ****
>
> Dear Felix,****
>
>  ****
>
> Thank you very much. Probably Tadej can prepare the use cases you mention,
> with the consolidated data category. About the question 3 and 4, I can tell
> you the following:****
>
>  ****
>
> 3) Would it be produced also by an automatic text annotation tool?****
>
>  ****
>
> For the pointers to the three information referred (concepts in Ontology,
> meanings in Lexical DB, and terms in Terminological resources) I think it
> would be possible semiautomatic annotation tools, that is, proposed by the
> tool and confirmed by user.****
>
>  ****
>
> The fully automatic text annotation  would need more sophisticate
> â€œsemantic calculusâ€, and most of these are under research, as far as I
> know. Maybe, in this cases, it should be combined with
> textAnalysisAnnotation, specifying in *Annotation agent* â€“ and *Confidence
> score* â€“ which system and with which reliability has been produced.****
>
>  ****
>
> 4) Would 1-2 be consumed by an MT tool, or by other tools?****
>
>  ****
>
> These can be basically consumed by language processing tools, like MT, and
> other Linguistic Technology that needs content or semantic info. For
> instance Text Analytics, Semantic search, etc.. In the localization chains,
> these information can be also used by automatic or semiautomatic processes
> (like selection of dictionaries for translations, or selection of
> translators/revisers by subject area) ****
>
>  ****
>
> It could be also used by humans for translation or post-edition in case of
> ambiguity or lake of context in the content, but mostly by automatic
> systems.****
>
>  ****
>
> I hope this helps.****
>
> Pedro****
>
>  ****
>  ------------------------------
>
> *De:* Felix Sasaki [mailto:fsasaki@w3.org <fsasaki@w3.org>]
> *Enviado el:* sÃ¡bado, 02 de junio de 2012 14:13
> *Para:* Tadej Stajner; pedro.diez
> *CC:* public-multilingualweb-lt@w3.org
> *Asunto:* Re: [ACTION-94]: go and find examples of concept ontology
> (semantic features of terms as opposed to domain type ontologies)****
>
>  ****
>
> Hi Tadej, Pedro, all,****
>
>  ****
>
> this looks like a great chain of producing and consuming metadata.****
>
>  ****
>
> Apologies if this was explained during last weeks call or before, but can
> you clarify a bit the following:****
>
>  ****
>
> 1) How would the actual HTML markup produced in the original text
> annotation use case look like?****
>
> 2) How would the markup in this use case look like?****
>
> 3) Would it be produced also by an automatic text annotation tool?****
>
> 4) Would 1-2 be consumed by an MT tool, or by other tools?****
>
>  ****
>
> Thanks again,****
>
>  ****
>
> Felix ****
>
> 2012/5/31 Tadej Stajner <tadej.stajner@ijs.si>****
>
> Hi Pedro,
> thanks for the excellent explanation. If I understand you correctly, a
> sufficient example for this use case would be annotation of individual
> words with synset URI of the appropriate wordnet? If so, then I believe
> this route can be practical - I think linking to the synset is a more
> practical idea than expressing semantic features of the word given the
> available tools.
>
> Enrycher can do automatic all-word disambiguation into the english
> wordnet, whereas  we don't have anything specific in place for semantic
> features (which I suspect also holds for other text analytics providers).
>
> I'm also in favor of prescribing wordnets for individual languages as
> valid selector domains as you suggest in option 1). That would make
> validation easier since we have a known domain.
>
> @All: Can we come up with a second implementation for this use case,
> preferrably a consumer?
>
> -- Tadej****
>
>
>
>
> On 5/29/2012 2:00 PM, Pedro L. DÃez Orzas wrote: ****
>
> Dear all,****
>
>  ****
>
> Sorry for the delay. I tried to contact some people I think can contribute
> to this, but they are not available these weeks. ****
>
>  ****
>
> Before providing an example to consider all if it is worthwhile to
> maintain â€œsemantic selectorâ€ attribute in the consolidation of
> â€œDisambiguationâ€ I would like to do a couple considerations:****
>
>  ****
>
>    1. Probably we will not have short term any implementation, but there
>    are for example few semantic networks available in web (see
>    http://www.globalwordnet.org/gwa/wordnet_table.html) that could be
>    mapped using semantic selectors. See on line for example, the famous
>    http://wordnetweb.princeton.edu<http://wordnetweb.princeton.edu/perl/webwn>
>    ).****
>    2. The W3C working group SKOS (Simple Knowledge Organization System
>    Reference) are maybe dealing with similar things.****
>
>  ****
>
> The â€œsemÃ¡ntica selectorâ€ allows further lexical (simple words or multi
> words) distinctions than a â€œdomainâ€ or an ontology like NERD. Also, the
> denotation is different from the â€œconcept referenceâ€, most of all in part
> of speech like verbs.  ****
>
>  ****
>
> Within the same domain, referring to very similar concepts, languages have
> semantic differences. Depending on the semantic theory used, each tries to
> captivate these differences by means of different systems (semantic
> features, semantic primitives, semantic nodes (in semantic networks), other
> semantic representations). An example could be the German verb â€œlÃ¶schenâ€,
> which in different contexts can take different meanings that can be try to
> capture using different selectors, with the different systems.****
>
>  ****
>
> â€“         lÃ¶schen                        -> clear             (some
> bits)
>                                    -> delete           (files)
>                                    -> cancel          (programs)
>                                    -> erase            (a scratchpad)
>                                    -> extinguish     (a fire)****
>
>  ****
>
> Other possible translations of the verb* *â€œlÃ¶schenâ€ are:****
>
> delete****
>
> lÃ¶schen, streichen, tilgen, ausstreichen, herausstreichen****
>
> clear****
>
> lÃ¶schen, klÃ¤ren, klarmachen, leeren, rÃ¤umen, sÃ¤ubern****
>
> erase****
>
> lÃ¶schen, auslÃ¶schen, tilgen, ausradieren, radieren, abwischen****
>
> extinguish****
>
> lÃ¶schen, auslÃ¶schen, zerstÃ¶ren****
>
> quench****
>
> lÃ¶schen, stillen, abschrecken, dÃ¤mpfen****
>
> put out****
>
> lÃ¶schen, bringen, ausmachen, ausschalten, treiben, verstimmen****
>
> unload****
>
> entladen, abladen, ausladen, lÃ¶schen, abstoÃŸen, abwÃ¤lzen****
>
> discharge****
>
> entladen, erfÃ¼llen, entlassen, entlasten, lÃ¶schen, ausstoÃŸen****
>
> wipe out****
>
> auslÃ¶schen, lÃ¶schen, ausrotten, tilgen, zunichte machen, auswischen****
>
> slake****
>
> stillen, lÃ¶schen****
>
> close****
>
> schlieÃŸen, verschlieÃŸen, abschlieÃŸen, sperren, zumachen, lÃ¶schen****
>
> blot****
>
> lÃ¶schen, abtupfen, klecksen, beklecksen, sich unmÃ¶glich machen, sich
> verderben****
>
> turn off****
>
> ausschalten, abbiegen, abstellen, abdrehen, einbiegen, lÃ¶schen****
>
> blow out****
>
> auspusten, lÃ¶schen, aufblasen, aufblÃ¤hen, aufbauschen, platzen****
>
> zap****
>
> abknallen, dÃ¼sen, umschalten, lÃ¶schen, tÃ¶ten, kaputtmachen****
>
> redeem****
>
> einlÃ¶sen, erlÃ¶sen, zurÃ¼ckkaufen, tilgen, retten, lÃ¶schen****
>
> pay off****
>
> auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, lÃ¶schen****
>
> switch out****
>
> lÃ¶schen****
>
> unship****
>
> ausladen, entladen, abnehmen, lÃ¶schen****
>
> souse****
>
> eintauchen, durchtrÃ¤nken, lÃ¶schen, nass machen****
>
> rub off****
>
> abreiben, abgehen, abwetzen, ausradieren, abscheuern, lÃ¶schen****
>
> strike off****
>
> lÃ¶schen****
>
> land****
>
> landen, an Land gehen, kriegen, an Land ziehen, aufsetzen, lÃ¶schen****
>
>  ****
>
>  ****
>
>  ****
>
> According to this, the consolidation of disambiguation/namedEntity/  data
> categories under â€œTerminologyâ€
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguationcould be the following. It is thought to cover operational URI or XPath
> pointers to the current three most important semantic resources: conceptual
> (ontology), semantic (semantic networks or lexical databases) and
> terminological (glossaries and terminological resources), where ontologies
> are used for both general lexicon and terminology, semantic networks to
> represent general vocabulary (lexicon), and terminological resources
> specialized vocabulary.****
>
>  ****
>
> *disambiguation*****
>
> Includes data to be used by MT systems in disambiguating difficult content
> ****
>
>  ****
>
> *Data model*****
>
>    - concept reference: points to a *concept in an ontology* that this
>    fragment of text represents. May be an URI or an XPath pointer.****
>    - semantic selector: points to a *meaning in an semantic network* that
>    this fragment of text represents. May be an URI or an XPath pointer.***
>    *
>    - terminology reference: points to *a term in a terminological resource
>    * that this fragment of text represents. May be an URI or an XPath
>    pointer.****
>    - equivalent translation: expressions of that concept in other
>    languages, for example for training MT systems****
>
>  ****
>
>  ****
>
> Also, I would keep *textAnalysisAnnotation*, since the purpose is quite
> different.****
>
>  ****
>
> Anyway, if we consider not to include â€œsemantic selectorâ€ now, maybe it
> can be for future versions or to be treated in liaison with other groups.*
> ***
>
>  ****
>
> I hope it helps,****
>
> Pedro****
>
>  ****
>
> *__________________________________*****
>
> * *****
>
> *Pedro L. DÃez Orzas*****
>
> *Presidente Ejecutivo/CEO*****
>
> *Linguaserve InternacionalizaciÃ³n de Servicios, S.A.*****
>
> *Tel.: +34 91 761 64 60
> Fax: +34 91 542 89 28 *****
>
> *E-mail: **pedro.diez@linguaserve.com*****
>
> *www.linguaserve.com*****
>
> * *****
>
> Â«En cumplimiento con lo previsto con los artÃculos 21 y 22 de la Ley
> 34/2002, de 11 de julio, de Servicios de la Sociedad de InformaciÃ³n y
> Comercio ElectrÃ³nico, le informamos que procederemos al archivo y
> tratamiento de sus datos exclusivamente con fines de promociÃ³n de los
> productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÃ“N DE
> SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y
> tratamiento de los datos proporcionados, o no deseen recibir comunicaciones
> comerciales sobre los productos y servicios ofrecidos, comunÃquenoslo a
> clients@linguaserve.com, y su peticiÃ³n serÃ¡ inmediatamente cumplida.Â»****
>
>  ****
>
> "According to the provisions set forth in articles 21 and 22 of Law
> 34/2002 of July 11 regarding Information Society and eCommerce Services, we
> will store and use your personal data with the sole purpose of marketing
> the products and services offered by LINGUASERVE INTERNACIONALIZACIÃ“N DE
> SERVICIOS, S.A. If you do not wish your personal data to be stored and
> handled, or you do not wish to receive further information regarding
> products and services offered by our company, please e-mail us to
> clients@linguaserve.com. Your request will be processed immediately."****
>
>  *____________________________________*****
>
>  ****
>
>  ****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Felix Sasaki****
>
> DFKI / W3C Fellow****
>
>  ****
>
> ** **
>
>
>
> ****
>
> ** **
>
> --
> Felix Sasaki ****
>
> DFKI / W3C Fellow****
>
> ** **
>
> ** **
>
>
>
> ****
>
> ** **
>
> --
> Felix Sasaki ****
>
> DFKI / W3C Fellow****
>
> ** **
>
> ** **
>



-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Saturday, 9 June 2012 04:45:03 UTC