Lexicographic Data in Ontolex (Draft) for the telco of Friday, 10th of October, 15:00 from Thierry Declerck on 2014-10-09 (public-ontolex@w3.org from October 2014)

From: Thierry Declerck <declerck@dfki.de>
Date: Thu, 09 Oct 2014 12:26:06 +0200
To: public-ontolex@w3.org, Thierry Declerck <Thierry.Declerck@dfki.de>
Message-ID: <543662BE.80701@dfki.de>

Am 09.10.2014 09:30, schrieb Philipp Cimiano:
> Dear all,
>
>   this is a gentle reminder that we will have our regular ontolex 
> teleconference tomorrow 10th of October at 15:00. I will send out 
> access details tonight.
>
> As main agenda point I have the following:
>
> 1) Variation and translation module: discussion on definitions, etc.
> 2) Lexicographic work of Thierry (@Thierry: can you please paste the 
> documents to the list as not everyone has access to the LIDER Google 
> docs, thanks!)
Dear Philipp, All,

Please find attached the files Philipp is mentioning. The file 
tei2ontolex is the result of my "playing" with the conversion of two TEI 
encoded dictionaries/lexicons of Arabic dialects (files attached), which 
have been tansmitted to us by the Austrian Academy of Sciences 
(Karl-Heinz Mörth, at ICLTT). In the context of the COST Action ENeL 
(http://www.elexicography.eu/), in which the Austrian Academy is 
centrally involved, we get some more lexicographic data in various 
formats and languages (and coverage). Interest to Ontolex is quite 
active there (we gave two talks in this COST action), and if we can get 
to some guidelines/proposals for encoding such data, this would give 
quite some impact to our work (Ontolex CG, but also the project LIDER) I 
thin.

All I have been doing so far is to follow my intuition on how some of 
the TEI encoded data can be represented with Ontolex and Lexinfo. I also 
added some features, like properties for time/date and location (for 
other data I am working this is relevant, for example for marking the 
locations in which a dialect is in use, or giving some temporal 
information to etymological information.
Well just a first attempt on my side, applied to relatively consistent 
data (you will see that the way people encode sense at the input side is 
not very consistent).

For this attempt, I first had a look at the input data and had some 
manual encodings using TopBraid.
The I wrote a Perl script for transforming the whole data set. The 
resulting ttl file can be edited in TopBraid. I didn't manage to see the 
result in Protégé.
I can any time adapt my script to the suggestions made by the group.

Tomorrow I can not attend the telco, but I think that the data I am 
sending in the attachement are cleear :-)

Best

Thierry


-- 

Thierry Declerck,
Senior Consultant at DFKI GmbH, Language Technology Lab
Stuhlsatzenhausweg, 3
D-66123 Saarbruecken
Phone: +49 681 / 857 75-53 58
Fax: +49 681 / 857 75-53 38
email: declerck@dfki.de

-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------

Attachments

application/x-zip-compressed attachment: apc_eng_002__corpus3_aac_ac_at_2014_03_12_a.zip
application/x-zip-compressed attachment: arz_eng_006__v001_2014_10_02_a.zip
application/x-zip-compressed attachment: tei2ontolex.zip

Received on Thursday, 9 October 2014 10:26:54 UTC