W3C home > Mailing lists > Public > public-ontolex@w3.org > October 2014

Re: Lexicographic Data in Ontolex (Draft) for the telco of Friday, 10th of October, 15:00

From: Thierry Declerck <declerck@dfki.de>
Date: Sun, 12 Oct 2014 12:33:49 +0200
Message-ID: <543A590D.3010106@dfki.de>
To: public-ontolex@w3.org

Am 10.10.2014 15:00, schrieb Philipp Cimiano:
> Hi Thierry,
>
>  thanks for sharing this with us. A few comments:
Thanks for this Philipp. Answering to the comments below, sometimes with 
further questions
>
> 1) I wonder why you do not use the original IDs of lexical entries in
> the URIs of the lexical entries you create. That would make it easier
> to identify the original source entry.

I was looking for having a way for getting really unique URLs. Having 
two dialectal dictionaries in the same file could lead to the situation 
that we have the same string representig two different entries. There 
are also dictionaries that just repeat entries in case they have 
different senses, etc. In my case I just added as a prefix the three 
first letters of the name of the dictionaries, so that the link to the 
original source entry is kind of ensured. And in general it can happen 
the source entry contains characters that are no allowed in an URL.
If you refer here to the "numID" feature of one the original dictionary, 
the authors told that they will delete this feature.
>
> 2) I wonder why you use ontolex:denotes instead of ontolex:sense. What
> you describe as range of ontolex:denotes seem to be senses rather than
> concepts with concrete examples of usage.

Right! I was not sure about this one. Just because ontolex:sense has 
"LexicalSense" as its range. And it is my understanding that instances 
of LexicalSense should have an ontolex:reference. But if this is not 
"obligatory", I would for sure use ontolex:sense.

On the topic "LexicalConcept". In lexicography/dialectology there is an 
approach called "concept-based" dictionary generation. People get a list 
of "concepts" and go with this to certain region and ask inhabitants of 
this region how they "serialize" this concept.
>
> 3) It seems that you have not modelled the translations relations in
> the data, so far, is that right or did I miss them?

True. My feeling was that the authors of the dictionary were not 
encoding a "real" translation between two entries, but just encoding 
senses introduced by words in different languages. I will check the work 
by Jorge on this, as you suggested.
>
> 4) You use the ontolex namespace for your own URIs. That is wrong as
> they are not in the ontolex namespace. Please use your own (local)
> namespace there

Sure! We didn't have a name space from the dictionary owner. I will use 
our own one then.
>
> 5)
>
> ontolex:sense_Sense_1_apc_eng_ahziye_001
>     rdf:type ontolex:SenseLexicon ;
>     rdfs:label "Zapatos"^^xsd:string ;
>     ontolex:language "es"^^xsd:string ;
> .
>
> This should be an ontolex:LexicalSense not an ontolex:SenseLexicon.
> The range of ontolex:sense is always an ontolex:LexicalSense

Oooops, put the wrong type here, thanks for pointing at this.

> In general, I think it would be good to follow the guidelines that
> Jorge has used in translating the Apertium lexica to lemon/ontolex
> RDF. It seems to be the same type of data so that similar principles
> should apply.

OK!
>
> See here:
> http://bpmlod.github.io/report/bilingual-dictionaries/index.html
>
> Regards,
>
> Philipp.
>
> Am 09.10.14 12:26, schrieb Thierry Declerck:
>>
>> Am 09.10.2014 09:30, schrieb Philipp Cimiano:
>>> Dear all,
>>>
>>>   this is a gentle reminder that we will have our regular ontolex
>>> teleconference tomorrow 10th of October at 15:00. I will send out
>>> access details tonight.
>>>
>>> As main agenda point I have the following:
>>>
>>> 1) Variation and translation module: discussion on definitions, etc.
>>> 2) Lexicographic work of Thierry (@Thierry: can you please paste the
>>> documents to the list as not everyone has access to the LIDER Google
>>> docs, thanks!)
>> Dear Philipp, All,
>>
>> Please find attached the files Philipp is mentioning. The file
>> tei2ontolex is the result of my "playing" with the conversion of two
>> TEI encoded dictionaries/lexicons of Arabic dialects (files
>> attached), which have been tansmitted to us by the Austrian Academy
>> of Sciences (Karl-Heinz Mörth, at ICLTT). In the context of the COST
>> Action ENeL (http://www.elexicography.eu/), in which the Austrian
>> Academy is centrally involved, we get some more lexicographic data in
>> various formats and languages (and coverage). Interest to Ontolex is
>> quite active there (we gave two talks in this COST action), and if we
>> can get to some guidelines/proposals for encoding such data, this
>> would give quite some impact to our work (Ontolex CG, but also the
>> project LIDER) I thin.
>>
>> All I have been doing so far is to follow my intuition on how some of
>> the TEI encoded data can be represented with Ontolex and Lexinfo. I
>> also added some features, like properties for time/date and location
>> (for other data I am working this is relevant, for example for
>> marking the locations in which a dialect is in use, or giving some
>> temporal information to etymological information.
>> Well just a first attempt on my side, applied to relatively
>> consistent data (you will see that the way people encode sense at the
>> input side is not very consistent).
>>
>> For this attempt, I first had a look at the input data and had some
>> manual encodings using TopBraid.
>> The I wrote a Perl script for transforming the whole data set. The
>> resulting ttl file can be edited in TopBraid. I didn't manage to see
>> the result in Protégé.
>> I can any time adapt my script to the suggestions made by the group.
>>
>> Tomorrow I can not attend the telco, but I think that the data I am
>> sending in the attachement are cleear :-)
>>
>> Best
>>
>> Thierry
>>
>>
>

-- 

Thierry Declerck,
Senior Consultant at DFKI GmbH, Language Technology Lab
Stuhlsatzenhausweg, 3
D-66123 Saarbruecken
Phone: +49 681 / 857 75-53 58
Fax: +49 681 / 857 75-53 38
email: declerck@dfki.de

-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------
Received on Sunday, 12 October 2014 10:34:12 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:36:45 UTC