W3C home > Mailing lists > Public > public-ontolex@w3.org > October 2017

Re: Call on the 10th of October, 14:00 CEST

From: Fahad Khan <anasfkhan81@gmail.com>
Date: Mon, 9 Oct 2017 14:42:44 +0200
Message-ID: <CAK+N+9goEJEe1Paf1F4HPB=q8yQb4ji0-zOAhugxST7bqHB4LQ@mail.gmail.com>
To: Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de>
Cc: public-ontolex <public-ontolex@w3.org>
Hi Everyone
I won't be able to make it to the skype call tomorrow but I just wanted to
make a few points regarding the discussion last week.

One of the main problems we had when we were thinking about converting the
intermediate Liddell-Scott Greek-English lexicon (work we presented at the
ontolex workshop in the summer) from the Perseus TEI encoding into linked
data, was deciding what information to encode into RDF and what to leave
out. The idea being that in the ontolex encoding you would include all the
semantic information and then the rest is essentially just an artifact of
the original print format that you can just leave as TEI.  It turns out
that the distinction is not as easy to make as it first seems.

For instance, iLiddell Scott in common with many other scholarly
dictionaries of the era structures its entries in terms of layers, with the
definitions and information becoming more detailed and specific the further
down you get and some of the higher level senses seem there just to group
together other sub-senses.  Of course you can take the senses listed under
each entry and represent them using what is essentially a simple list
enumeration, the lemon-ontolex default essentially-- but then you lose the
hierarchical information. And the more we studied the dictionary the more
we became convinced that this wasn't a wise thing to do -- although in
order to be sure it would be necessary to consult a specialist in
lexicography, Victorian dictionaries. But to us it seemed clear that
removing some of this contextual information, sense scaffolding, would
actually render the senses less clear, and less usable. In short we weren't
sure whether in carrying out a simple, blunt encoding of the senses without
hierarchical information the result  wouldn't just be a (flawed)
interpretation of the information in the resource rather than actually
being what we wanted to advertise it as: a linked data version of the
iLiddell Scott (in the end we tried to keep as much of this hierarchical
information as possible by adding new properties and classes).

I'm bringing up sense hierarchies here not because I want to suggest that
they should go in the model (although I do think they should). The point is
in general it's not always clear what the thinking between the ranking and
arrangement of senses (even if it's flat) in legacy lexical works is
--e.g., it could be frequency, or the order of importance according to the
lexicographer, temporal precedence  or some mixture of the above-- and this
also applies to many other aspects of the formatting of lexicographic
resources too -- and so it probably pays to be as agnostic as possible.  I
also think we need more case studies in encoding legacy lexical resources
(and I thank Francesca for hers) because I think these constitute an
essential use case for any prospective dictionary model.

Essentially in encoding these resources into LOD we want to make the
information in them as accessible as possible in a way that's not possible
with just TEI. So for example to take something I'm really interested in at
the moment, how do you liberate etymological information encoded in legacy
dictionaries in a way that allows you to write powerful queries about the
origins or words, or that allow you to isolate the lexicon used by a
particular author or in a given text (for instance the attestation
information in the full Liddell Scott is extremely extensive and you can
write some really useful queries even with the intermediate version that
we've encoded)? It turns out this kind of information is very often
ambigious and underspecified in dictionaries (old and new). So  how do you
handle this? In many cases aside from conducting a seance you will never be
able to ascertain why a given decision was made or what the lexicographer
really meant, and so the modeler's uncertainty has to become a part of the
model that's being developed; without becoming so unwieldy that it is
essentially unusable.

Cheers,
Fahad


On 9 October 2017 at 11:47, Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.de
> wrote:

> Hi Francesca,
>
>  I think this is a great case study that we should defintely consider. Can
> you present this briefly during the telco tomorrow?
>
> I will send a reminder later today.
>
> Philipp.
>
> Am 03.10.17 um 13:03 schrieb Francesca Frontini:
>
> Dear all,
> I'm fine with Philipp's proposal, both as to the time and to the topic. I
> would like to add a case study to the discussion.
>
> As some of you may know, here in Montpellier we have a projet aimed to
> publish various old editions of the Petit Larousse Illustré.
> First of all as TEI dict, but the idea is to make some of the information
> enviable also as LOD.
>
> In this document, you can find an example of a lexical entry and some
> questions.
> https://docs.google.com/document/d/1TogPjrLyJS0OK5pzww28751MX7179
> -NzCIsDdzae65o/edit
>
> I'be really grateful to have your opinion, in particular that of Julia and
> Jorge who are already collecting exemples as Philip said.
>
> Best,
> Francesca
>
>
>
> 2017-10-02 16:28 GMT+02:00 Philipp Cimiano <cimiano@cit-ec.uni-bielefeld.
> de>:
>
>> Dear all,
>>
>>  I propose we have another ontolex telco on the 10th of October, 14:00
>> CEST.
>>
>> I propose we continue discussing the concrete examples that Julia and
>> Jorge have been preparing.
>>
>> I think the conclusion we had is that we wanted to continue working
>> bottom-up from examples of current lexica and then try to get an
>> abstract model that is able to accomodate future dictionaries that are
>> native LLD dictionaries.
>>
>> Let's try!
>>
>> We will again the meeting via skype. It worked quite well last time.
>>
>> Greetings,
>>
>> Philipp.
>>
>>
>> --
>> --
>> Prof. Dr. Philipp Cimiano
>> AG Semantic Computing
>> Exzellenzcluster für Cognitive Interaction Technology (CITEC)
>> Universität Bielefeld
>>
>> Tel: +49 521 106 12249
>> Fax: +49 521 106 6560
>> Mail: cimiano@cit-ec.uni-bielefeld.de
>>
>> Office CITEC-2.307
>> Universitätsstr. 21-25
>> 33615 Bielefeld, NRW
>> Germany
>>
>>
>>
>>
>
> --
> --
> Prof. Dr. Philipp Cimiano
> AG Semantic Computing
> Exzellenzcluster für Cognitive Interaction Technology (CITEC)
> Universität Bielefeld
>
> Tel: +49 521 106 12249 <+49%20521%2010612249>
> Fax: +49 521 106 6560 <+49%20521%201066560>
> Mail: cimiano@cit-ec.uni-bielefeld.de
>
> Office CITEC-2.307
> Universitätsstr. 21-25
> 33615 Bielefeld, NRW
> Germany
>
>
Received on Monday, 9 October 2017 12:43:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:36:59 UTC