Re: morph module telco today from Fahad Khan on 2019-07-01 (public-ontolex@w3.org from July 2019)

From: Fahad Khan <anasfkhan81@gmail.com>
Date: Mon, 1 Jul 2019 15:51:47 +0200
To: Christian Chiarcos <christian.chiarcos@web.de>
Cc: klimek@informatik.uni-leipzig.de, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAK+N+9iuhm_anXNmVJ0epzLCO1MAuYmZh2KX=FWUR5B3g-6fbw@mail.gmail.com>
Dear Christian,

I'm not proposing to parse strings using SWRL (as Wilcock tried to do; and
I understand why there wasn't much uptake for his idea at the time), only
to describe morphological patterns in a way that allows the generation of
morphological variants (word forms) of an entry (a far simpler task). The
rules for doing this in languages like Italian or French are fairly
straightforward, efficient enough to work on a reasonable sized lexicon (as
our work with the SIMPLE lexicon has shown), and the kinds of rules you get
(in this particular context) aren't really much harder to understand than
regular expressions, and in fact might even be easier to understand once
you have a basic grasp of how horn clauses are written in rule languages
(not that hard to come by). At the same time using SWRL ensures that we
produce human readable rules while remaining within the Semantic Web stack
and use pre existing technologies (SWRL functionality, including a
pre-installed rule enging is now bundled in as a feature with Protege).

However, I have the strong suspicion that the modelling of morphological
patterns via SWRL rules (in order to generate forms) will not be viable for
all languages (Semitic languages for instance, though I haven't actually
tried to model this myself, since I'm not really compent enough to do so)
so I am not putting it forward as a general purpose method for representing
intensional morphological descriptions . In fact I don't think there is one
solution, one silver bullet, here (in the sense of being both descriptive
and machine actionable while allowing us to remain within the whole
semantic web ecosystem). However, I think that whatever we come up with
should be as compatible as possible with approaches like the SWRL one
(which does work with a lot of languages but maybe not all) while at the
same time leaving open the possibility of using other approaches such as
finite state transducers in a more expressive logic programming language.
The LMF way of doing this was interesting, they started by making a
distinction between extensional and intensional morphological descriptions,
and then came up with their own formalism to represent such patterns
(represented as strings), these could then be later translated into other
machine actionable formats.

Cheers,
Fahad

On Mon, 1 Jul 2019 at 15:07, Christian Chiarcos <christian.chiarcos@web.de>
wrote:

> Dear Fahad,
>
> thanks a lot for this update. In fact, it ties in quite neatly with other
> approaches on parsing with SWRL/RIF, e.g., Graham Wilcock HPSG parser. On
> the other hand, we should keep in mind that Wilcock basically failed (not
> in terms of expressivity or performance, but in terms of adaptation by the
> community) and he himself thus abandoned the idea. So, while we *should*
> mention that rules can be implemented in this way (in terms or SW
> technology, this is the "right" way of implementing rules), I don't think
> we should prescribe SWRL nor RIF.
>
> This is for two reasons:
>
> On a technological level, RIF is a high-level technology, operating on top
> of OWL, so its proper handling requires a lot of expertise by the user and
> is technically demanding. I'm not sure about the popularity of either RIF
> or OWL beyond the core Semantic Web community anymore, whereas plain RDF is
> relatively widely used.
>
> On a conceptual level, the dominating paradigm in morphology generation
> are finite state transducers, and these can be reduced to regular
> expressions, and as we have native support for regex in SPARQL Update, SWRL
> and most programming languages, this would be more generic and come with a
> lower entry barrier. But then, regular expressions must also not be the
> only way to populate a paradigm (resp., a particular inflection type), as
> many lexicographers and linguists will find this too technical and prefer
> to provide representative examples rather than concrete rules -- and our
> modelling should cover both uses.
>
> Just my 2ct,
> Christian
>
> PS: I see drawbacks of the regex idea, too, in particular in that it is
> string-based rather than concept-based.
>
> PPS: A compromise could be to use the swrlb:replace to write
> transformation rules with regular expressions. However, the SWRL
> serialization in Turtle is close to a nightmare (because its bindings are
> internally represented by lists), and we should probably use TTL for
> illustrative examples. I doubt we could convincingly sell this to anyone.
>
> Am .07.2019, 12:13 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:
>
> Hi Bettina, All,
> Here is the poster I presented at Euralex last year which I mentioned in
> the last telco and which describes the approach we took to modelling
> Italian morphology using SWRL:
>
> https://docs.google.com/presentation/d/1pHt8IG0ni5x9AkoPCsCCccRPEFIeObW7eR-PxY1JN7A/edit?usp=sharing
> Cheers,
> Fahad
>
> On Tue, 25 Jun 2019 at 12:12, Bettina Klimek <
> klimek@informatik.uni-leipzig.de> wrote:
>
>> Hi all,
>>
>> this is the link to the telco today at 1pm CEST:
>>
>> https://hangouts.google.com/call/UNgLuAFv3BfDfX7P5x8EAEEI
>>
>> We will continue to discuss the modelling of morphological patterns and
>> paradigms.
>>
>> Regards,
>>
>> Bettina
>>
>> --
>> Bettina Klimek
>> PhD Student
>> Department of Computer Science, University of Leipzig
>> Institute for Applied Informatics (InfAI)
>> Goerdelerring 9
>> 04109 Leipzig
>>
>> Research Group: http://aksw.org/Groups/KILT
>> Homepage: http://aksw.org/BettinaKlimek
>> Projects: http://mmoon.org, http://linguistics.okfn.org
>> Events:  12 -17 May 2019 "3rd Summer Datathon on Linguistic Linked Open
>> Data (SD-LLOD 2019)"
>>           https://datathon2019.linguistic-lod.org/
>>           20-22 May 2019 "LDK 2019 – 2nd Conference on Language, Data and
>> Knowledge"
>>           http://2019.ldk-conf.org/
>>
>>
>>
>
>
>
Received on Monday, 1 July 2019 13:52:28 UTC