Re: Call on the 10th of October, 14:00 CEST

I have some observations to the current discussion on the open issues.

I1: The German ‘Leiter’ can be female gender and translate into 
en:ladder, a noun or it can be male gender and translate into en:leader, 
either noun or a role, if you support that. Both are not or if any, very 
esoteric related, one could maybe construct that if you move up the 
corporate ladder you end up being a leader. Other than the article they 
have a common morphology but they should not be the same lexical entry.

A bit more interesting is the German ‘Bank’. ‘Die Bank’, the bench has a 
plural of ‘Bänke’ while ‘die Bank’, the place you bring your money to, 
has a plural of ‘Banken’. Entirely different morphology but same gender. 
Definitely different lexical entries.

Of the top of my head these are examples of many I encountered 
processing Europarl and the German Wikipedia dump. It required us to 
modify some code to properly map the morphology to the corresponding 
lexemes.


I have a opinion about the usage examples and ordering of dictionary 
entries.

Usage examples are useful but they should come from the semantically 
linked data that is behind the dictionary entries. I think we need to 
take a hard look at how the systems are going to be used in the future. 
In a printed dictionary, even if it is a translation app, the publisher 
does not know what the user intends to look up, therefore the ‘printed’ 
dictionary needs a representative set of examples and a ordering that is 
based of some acceptable scientific reasoning. In the future, where a 
electronic personal assistant holds a conversation with the user, the 
context is known and should re-prioritize word choices and samples, even 
withhold dictionary entries that are clearly not in context. The 
computers can do that, I would not spend much time on in what sequence 
the entries should be displayed/stored. If any, I would focus on the 
ability of the computer to select based on context.

What I think is important is that the sources of the dictionary entries 
are tracked, in order to track back where entries came from, until they 
are confirmed from several reputable sources.

I will try to catch the call tomorrow, if I get out of bed in time. The 
call is at 5am PST for me. If not I will read the transcript and answer 
through the list.


regards

Tom


On 10/02/2017 07:28 AM, Philipp Cimiano wrote:
> Dear all,
>
>   I propose we have another ontolex telco on the 10th of October, 14:00 CEST.
>
> I propose we continue discussing the concrete examples that Julia and
> Jorge have been preparing.
>
> I think the conclusion we had is that we wanted to continue working
> bottom-up from examples of current lexica and then try to get an
> abstract model that is able to accomodate future dictionaries that are
> native LLD dictionaries.
>
> Let's try!
>
> We will again the meeting via skype. It worked quite well last time.
>
> Greetings,
>
> Philipp.
>
>

-- 
Tom Knorr
Independent Consultant currently working on The NeuroCollective
www.NeuroCollective.com

Blog: http://www.neurocollective.com/blog/

LinkedIn: https://www.linkedin.com/in/tom-knorr-4406406/

Received on Monday, 9 October 2017 23:19:46 UTC