Re: representing morphological rules [was: Re: morph module telco today] from Fahad Khan on 2019-07-01 (public-ontolex@w3.org from July 2019)

From: Fahad Khan <anasfkhan81@gmail.com>
Date: Mon, 1 Jul 2019 19:10:04 +0200
To: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
Cc: klimek@informatik.uni-leipzig.de, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAK+N+9hadMus8M0rVPBZ6Y8AhHQb=O6x9eZZXrXHpqFwJQeb=g@mail.gmail.com>
Dear Christian,
Yes it is a pain working with SWRL rules directly in turtle or XML, but
there are ways of getting round this, for instance the interface in Protege
that allows you to enter/view/edit the rules in the simpler horn clause
like format.
Fahad

On Mon, 1 Jul 2019 at 17:05, Christian Chiarcos <
chiarcos@informatik.uni-frankfurt.de> wrote:

> Am .07.2019, 15:51 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:
>
> Dear Christian,
>
> I'm not proposing to parse strings using SWRL (as Wilcock tried to do; and
> I understand why there wasn't much uptake for his idea at the time), only
> to describe morphological patterns in a way that allows the generation of
> morphological variants (word forms) of an entry (a far simpler task). The
> rules for doing this in languages like Italian or French are fairly
> straightforward, efficient enough to work on a reasonable sized lexicon (as
> our work with the SIMPLE lexicon has shown), and the kinds of rules you get
> (in this particular context) aren't really much harder to understand than
> regular expressions, and in fact might even be easier to understand once
> you have a basic grasp of how horn clauses are written in rule languages
> (not that hard to come by). At the same time using SWRL ensures that we
> produce human readable rules while remaining within the Semantic Web stack
> and use pre existing technologies (SWRL functionality, including a
> pre-installed rule enging is now bundled in as a feature with Protege).
>
> However, I have the strong suspicion that the modelling of morphological
> patterns via SWRL rules (in order to generate forms) will not be viable for
> all languages (Semitic languages for instance, though I haven't actually
> tried to model this myself, since I'm not really compent enough to do so)
> so I am not putting it forward as a general purpose method for representing
> intensional morphological descriptions . In fact I don't think there is one
> solution, one silver bullet, here (in the sense of being both descriptive
> and machine actionable while allowing us to remain within the whole
> semantic web ecosystem). However, I think that whatever we come up with
> should be as compatible as possible with approaches like the SWRL one
> (which does work with a lot of languages but maybe not all) while at the
> same time leaving open the possibility of using other approaches such as
> finite state transducers in a more expressive logic programming language.
> The LMF way of doing this was interesting, they started by making a
> distinction between extensional and intensional morphological descriptions,
> and then came up with their own formalism to represent such patterns
> (represented as strings), these could then be later translated into other
> machine actionable formats.
>
>
> Sure. My personal feeling is that regex with capturing groups might be a
> means to achieve that to a large extent (for morphology). And if there was
> some existing vocabulary to represent left-hand-sides and right-hand-sides
> of regex-based transformations, it would be ideal for our purposes. SWRL
> actually does that, but then look at the Turtle rendering of a simple
> replacement rule:
>
> SWRL:
> myFeat(?x, ?y) , replace(?y, "ak$", "a", ?z) -> myFeat(?x, ?z)
> # this is nice, indeed
>
> TTL (as produced by Protege):
>
> [ rdf:type swrl:Imp ;
>
>   swrl:head [ rdf:type swrl:AtomList ;
>
>        rdf:rest rdf:nil ;
>
>        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
>
>              swrl:propertyPredicate :myFeat ;
>
>              swrl:argument1 :x ;
>
>              swrl:argument2 :z
>
>             ]
>
>       ] ;
>
>  swrl:body [ rdf:type swrl:AtomList ;
>
>        rdf:rest [ rdf:type swrl:AtomList ;
>
>              rdf:rest rdf:nil ;
>
>              rdf:first [ rdf:type swrl:BuiltinAtom ;
>
>                    swrl:builtin swrlb:replace ;
>
>                    swrl:arguments [ rdf:type rdf:List ;
>
>                            rdf:first :y ;
>
>                            rdf:rest [ rdf:type rdf:List ;
>
>                                  rdf:first "ak$" ;
>
>                                  rdf:rest [ rdf:type rdf:List ;
>
>                                       rdf:first "a" ;
>
>                                       rdf:rest ( :z
>
>                                            )
>
>                                      ]
>
>                                 ]
>
>                           ]
>
>                   ]
>
>             ] ;
>
>        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
>
>              swrl:propertyPredicate :myFeat ;
>
>              swrl:argument1 :x ;
>
>              swrl:argument2 :y
>
>             ]
>
>       ]
> ] .
>
> We can get it a little bit more compact if we omit RDFS-inferrable triples
> and rdf:nils:
>
> [ rdf:type swrl:Imp ;
>
>   swrl:head [ rdf:type swrl:AtomList ;
>
>        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
>
>              swrl:propertyPredicate :myFeat ;
>
>              swrl:argument1 :x ;
>
>              swrl:argument2 :z ] ] ;
>
>   swrl:body [ rdf:type swrl:AtomList ;
>
>        rdf:rest [ rdf:type swrl:AtomList ;
>
>              rdf:first [ rdf:type swrl:BuiltinAtom ;
>
>                    swrl:builtin swrlb:replace ;
>
>                    swrl:arguments [ rdf:first :y ;
>
>                            rdf:rest [ rdf:first "ak$" ;
>
>                                  rdf:rest [ rdf:first "a" ;
>
>                                       rdf:rest ( :z ) ] ] ] ] ] ;
>
>        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
>
>              swrl:propertyPredicate :myFeat ;
>
>              swrl:argument1 :x ;
>
>              swrl:argument2 :y ] ] ] .
>
> But I see no way for further reduction.
>
>  I'm not too deep into SWRL, maybe there is a way to provide a more
> readable rendering, but better don't let this particular fragment get
> anywhere near your users. For the time being, all OntoLex examples have
> been Turtle-based, and shifting between different levels of representation
> (i.e., mixing SWRL and TTL) in the description will leave people highly
> confused. I think the best we can aim for is a vocabulary that
> approximately does the following
>
> [ a :ReplacementRule;
>   :onProperty :myFeat;
>   :lhs "ak$";
>   :rhs "a" ]
>
>  This is much more restricted than SWRL, of course, but such a
> mini-language can  be processed with SPARQL Update, e.g., to generate
> proper SWRL (or anything else).
>
> Best,
> Christian
>
>
> Cheers,
> Fahad
>
> On Mon, 1 Jul 2019 at 15:07, Christian Chiarcos <christian.chiarcos@web.de>
> wrote:
>
>> Dear Fahad,
>>
>> thanks a lot for this update. In fact, it ties in quite neatly with other
>> approaches on parsing with SWRL/RIF, e.g., Graham Wilcock HPSG parser. On
>> the other hand, we should keep in mind that Wilcock basically failed (not
>> in terms of expressivity or performance, but in terms of adaptation by the
>> community) and he himself thus abandoned the idea. So, while we *should*
>> mention that rules can be implemented in this way (in terms or SW
>> technology, this is the "right" way of implementing rules), I don't think
>> we should prescribe SWRL nor RIF.
>>
>> This is for two reasons:
>>
>> On a technological level, RIF is a high-level technology, operating on
>> top of OWL, so its proper handling requires a lot of expertise by the user
>> and is technically demanding. I'm not sure about the popularity of either
>> RIF or OWL beyond the core Semantic Web community anymore, whereas plain
>> RDF is relatively widely used.
>>
>> On a conceptual level, the dominating paradigm in morphology generation
>> are finite state transducers, and these can be reduced to regular
>> expressions, and as we have native support for regex in SPARQL Update, SWRL
>> and most programming languages, this would be more generic and come with a
>> lower entry barrier. But then, regular expressions must also not be the
>> only way to populate a paradigm (resp., a particular inflection type), as
>> many lexicographers and linguists will find this too technical and prefer
>> to provide representative examples rather than concrete rules -- and our
>> modelling should cover both uses.
>>
>> Just my 2ct,
>> Christian
>>
>> PS: I see drawbacks of the regex idea, too, in particular in that it is
>> string-based rather than concept-based.
>>
>> PPS: A compromise could be to use the swrlb:replace to write
>> transformation rules with regular expressions. However, the SWRL
>> serialization in Turtle is close to a nightmare (because its bindings are
>> internally represented by lists), and we should probably use TTL for
>> illustrative examples. I doubt we could convincingly sell this to anyone.
>>
>> Am .07.2019, 12:13 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:
>>
>> Hi Bettina, All,
>> Here is the poster I presented at Euralex last year which I mentioned in
>> the last telco and which describes the approach we took to modelling
>> Italian morphology using SWRL:
>>
>> https://docs.google.com/presentation/d/1pHt8IG0ni5x9AkoPCsCCccRPEFIeObW7eR-PxY1JN7A/edit?usp=sharing
>> Cheers,
>> Fahad
>>
>> On Tue, 25 Jun 2019 at 12:12, Bettina Klimek <
>> klimek@informatik.uni-leipzig.de> wrote:
>>
>>> Hi all,
>>>
>>> this is the link to the telco today at 1pm CEST:
>>>
>>> https://hangouts.google.com/call/UNgLuAFv3BfDfX7P5x8EAEEI
>>>
>>> We will continue to discuss the modelling of morphological patterns and
>>> paradigms.
>>>
>>> Regards,
>>>
>>> Bettina
>>>
>>> --
>>> Bettina Klimek
>>> PhD Student
>>> Department of Computer Science, University of Leipzig
>>> Institute for Applied Informatics (InfAI)
>>> Goerdelerring 9
>>> 04109 Leipzig
>>>
>>> Research Group: http://aksw.org/Groups/KILT
>>> Homepage: http://aksw.org/BettinaKlimek
>>> Projects: http://mmoon.org, http://linguistics.okfn.org
>>> Events:  12 -17 May 2019 "3rd Summer Datathon on Linguistic Linked Open
>>> Data (SD-LLOD 2019)"
>>>           https://datathon2019.linguistic-lod.org/
>>>           20-22 May 2019 "LDK 2019 – 2nd Conference on Language, Data
>>> and Knowledge"
>>>           http://2019.ldk-conf.org/
>>>
>>>
>>>
>>
>>
>>
>
>
> --
> Prof. Dr. Christian Chiarcos
> Applied Computational Linguistics
> Johann Wolfgang Goethe Universität Frankfurt a. M.
> 60054 Frankfurt am Main, Germany
>
> office: Robert-Mayer-Str. 11-15, #107
> mail: chiarcos@informatik.uni-frankfurt.de
> web: http://acoli.cs.uni-frankfurt.de
> tel: +49-(0)69-798-22463
> fax: +49-(0)69-798-28334
>
Received on Monday, 1 July 2019 17:10:41 UTC