representing morphological rules [was: Re: morph module telco today] from Christian Chiarcos on 2019-07-01 (public-ontolex@w3.org from July 2019)

From: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
Date: Mon, 01 Jul 2019 17:05:20 +0200
To: "Fahad Khan" <anasfkhan81@gmail.com>
Cc: klimek@informatik.uni-leipzig.de, public-ontolex <public-ontolex@w3.org>
Message-ID: <op.z38766rg89jat0@kitaba>
Am .07.2019, 15:51 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:

> Dear Christian,
> I'm not proposing to parse strings using SWRL (as Wilcock tried to do;  
> and I understand why there wasn't much uptake for his idea at the time),  
> only to >describe morphological patterns in a way that allows the  
> generation of morphological variants (word forms) of an entry (a far  
> simpler task). The rules for >doing this in languages like Italian or  
> French are fairly straightforward, efficient enough to work on a  
> reasonable sized lexicon (as our work with the >SIMPLE lexicon has  
> shown), and the kinds of rules you get (in this particular context)  
> aren't really much harder to understand than regular expressions, >and  
> in fact might even be easier to understand once you have a basic grasp  
> of how horn clauses are written in rule languages (not that hard to come  
> by). >At the same time using SWRL ensures that we produce human readable  
> rules while remaining within the Semantic Web stack and use pre existing  
> >technologies (SWRL functionality, including a pre-installed rule enging  
> is now bundled in as a feature with Protege).  
> However, I have the strong suspicion that the modelling of morphological  
> patterns via SWRL rules (in order to generate forms) will not be viable  
> for all >languages (Semitic languages for instance, though I haven't  
> actually tried to model this myself, since I'm not really compent enough  
> to do so) so I am not >putting it forward as a general purpose method  
> for representing intensional morphological descriptions . In fact I  
> don't think there is one solution, one >silver bullet, here (in the  
> sense of being both descriptive and machine actionable while allowing us  
> to remain within the whole semantic web ecosystem). >However, I think  
> that whatever we come up with should be as compatible as possible with  
> approaches like the SWRL one (which does work with a lot of >languages  
> but maybe not all) while at the same time leaving open the possibility  
> of using other approaches such as finite state transducers in a more  
> >expressive logic programming language. The LMF way of doing this was  
> interesting, they started by making a distinction between extensional  
> and >intensional morphological descriptions, and then came up with their  
> own formalism to represent such patterns (represented as strings), these  
> could then >be later translated into other machine actionable formats.

Sure. My personal feeling is that regex with capturing groups might be a  
means to achieve that to a large extent (for morphology). And if there was  
some existing vocabulary to represent left-hand-sides and right-hand-sides  
of regex-based transformations, it would be ideal for our purposes. SWRL  
actually does that, but then look at the Turtle rendering of a simple  
replacement rule:

SWRL:
myFeat(?x, ?y) , replace(?y, "ak$", "a", ?z) -> myFeat(?x, ?z)
# this is nice, indeed

TTL (as produced by Protege):


[ rdf:type swrl:Imp ;

   swrl:head [ rdf:type swrl:AtomList ;

        rdf:rest rdf:nil ;

        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;

              swrl:propertyPredicate :myFeat ;

              swrl:argument1 :x ;

              swrl:argument2 :z

             ]

       ] ;

  swrl:body [ rdf:type swrl:AtomList ;

        rdf:rest [ rdf:type swrl:AtomList ;

              rdf:rest rdf:nil ;

              rdf:first [ rdf:type swrl:BuiltinAtom ;

                    swrl:builtin swrlb:replace ;

                    swrl:arguments [ rdf:type rdf:List ;

                            rdf:first :y ;

                            rdf:rest [ rdf:type rdf:List ;

                                  rdf:first "ak$" ;

                                  rdf:rest [ rdf:type rdf:List ;

                                       rdf:first "a" ;

                                       rdf:rest ( :z

                                            )

                                      ]

                                 ]

                           ]

                   ]

             ] ;

        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;

              swrl:propertyPredicate :myFeat ;

              swrl:argument1 :x ;

              swrl:argument2 :y

             ]

       ]
] .

We can get it a little bit more compact if we omit RDFS-inferrable triples  
and rdf:nils:

[ rdf:type swrl:Imp ;

   swrl:head [ rdf:type swrl:AtomList ;

        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;

              swrl:propertyPredicate :myFeat ;

              swrl:argument1 :x ;

              swrl:argument2 :z ] ] ;

   swrl:body [ rdf:type swrl:AtomList ;

        rdf:rest [ rdf:type swrl:AtomList ;

              rdf:first [ rdf:type swrl:BuiltinAtom ;

                    swrl:builtin swrlb:replace ;

                    swrl:arguments [ rdf:first :y ;

                            rdf:rest [ rdf:first "ak$" ;

                                  rdf:rest [ rdf:first "a" ;

                                       rdf:rest ( :z ) ] ] ] ] ] ;

        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;

              swrl:propertyPredicate :myFeat ;

              swrl:argument1 :x ;

              swrl:argument2 :y ] ] ] .

But I see no way for further reduction.

  I'm not too deep into SWRL, maybe there is a way to provide a more  
readable rendering, but better don't let this particular fragment get  
anywhere near your users. For the time being, all OntoLex examples have  
been Turtle-based, and shifting between different levels of representation  
(i.e., mixing SWRL and TTL) in the description will leave people highly  
confused. I think the best we can aim for is a vocabulary that  
approximately does the following

[ a :ReplacementRule;
   :onProperty :myFeat;
   :lhs "ak$";
   :rhs "a" ]

  This is much more restricted than SWRL, of course, but such a  
mini-language can  be processed with SPARQL Update, e.g., to generate  
proper SWRL (or anything else).

Best,
Christian

>
> Cheers,Fahad   
> On Mon, 1 Jul 2019 at 15:07, Christian Chiarcos  
> <christian.chiarcos@web.de> wrote:
>> Dear Fahad,
>>
>> thanks a lot for this update. In fact, it ties in quite neatly with  
>> other approaches on parsing with SWRL/RIF, e.g., Graham Wilcock HPSG  
>> parser. On the >>other hand, we should keep in mind that Wilcock  
>> basically failed (not in terms of expressivity or performance, but in  
>> terms of adaptation by the >>community) and he himself thus abandoned  
>> the idea. So, while we *should* mention that rules can be implemented  
>> in this way (in terms or SW >>technology, this is the "right" way of  
>> implementing rules), I don't think we should prescribe SWRL nor RIF.
>> This is for two reasons:
>> On a technological level, RIF is a high-level technology, operating on  
>> top of OWL, so its proper handling requires a lot of expertise by the  
>> user and is >>technically demanding. I'm not sure about the popularity  
>> of either RIF or OWL beyond the core Semantic Web community anymore,  
>> whereas plain RDF >>is relatively widely used.
>>
>> On a conceptual level, the dominating paradigm in morphology generation  
>> are finite state transducers, and these can be reduced to regular  
>> expressions, >>and as we have native support for regex in SPARQL  
>> Update, SWRL and most programming languages, this would be more generic  
>> and come with a >>lower entry barrier. But then, regular expressions  
>> must also not be the only way to populate a paradigm (resp., a  
>> particular inflection type), as many >>lexicographers and linguists  
>> will find this too technical and prefer to provide representative  
>> examples rather than concrete rules -- and our modelling >>should cover  
>> both uses.
>>
>> Just my 2ct,
>> Christian
>>
>> PS: I see drawbacks of the regex idea, too, in particular in that it is  
>> string-based rather than concept-based.
>>
>> PPS: A compromise could be to use the swrlb:replace to write  
>> transformation rules with regular expressions. However, the SWRL  
>> serialization in Turtle >>is close to a nightmare (because its bindings  
>> are internally represented by lists), and we should probably use TTL  
>> for illustrative examples. I doubt we >>could convincingly sell this to  
>> anyone.
>>
>> Am .07.2019, 12:13 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:
>>
>>> Hi Bettina, All,Here is the poster I presented at Euralex last year  
>>> which I mentioned in the last telco and which describes the approach  
>>> we took to modelling Italian >>>morphology using SWRL:
>>> https://docs.google.com/presentation/d/1pHt8IG0ni5x9AkoPCsCCccRPEFIeObW7eR-PxY1JN7A/edit?usp=sharing
>>> Cheers,Fahad
>>>
>>> On Tue, 25 Jun 2019 at 12:12, Bettina Klimek  
>>> <klimek@informatik.uni-leipzig.de> wrote:
>>>> Hi all,
>>>>
>>>> this is the link to the telco today at 1pm CEST:
>>>>
>>>> https://hangouts.google.com/call/UNgLuAFv3BfDfX7P5x8EAEEI
>>>>
>>>> We will continue to discuss the modelling of morphological patterns  
>>>> andparadigms.
>>>>
>>>> Regards,
>>>>
>>>> Bettina
>>>>
>>>> --Bettina Klimek
>>>> PhD Student
>>>> Department of Computer Science, University of Leipzig
>>>> Institute for Applied Informatics (InfAI)
>>>> Goerdelerring 9
>>>> 04109 Leipzig
>>>>
>>>> Research Group: http://aksw.org/Groups/KILT
>>>> Homepage: http://aksw.org/BettinaKlimek
>>>> Projects: http://mmoon.org, http://linguistics.okfn.org
>>>> Events:  12 -17 May 2019 "3rd Summer Datathon on Linguistic Linked  
>>>> Open Data (SD-LLOD 2019)"
>>>>          https://datathon2019.linguistic-lod.org/
>>>>          20-22 May 2019 "LDK 2019 – 2nd Conference on Language, Data  
>>>> and Knowledge"
>>>>          http://2019.ldk-conf.org/
>>>>
>>>>
>>
>>
>>



-- 
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 11-15, #107
mail: chiarcos@informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28334
Received on Monday, 1 July 2019 15:05:53 UTC