Re: representing morphological rules [was: Re: morph module telco today] from John P. McCrae on 2019-07-02 (public-ontolex@w3.org from July 2019)

From: John P. McCrae <john.mccrae@insight-centre.org>
Date: Tue, 2 Jul 2019 09:33:23 +0100
To: Fahad Khan <anasfkhan81@gmail.com>
Cc: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>, Bettina Klimek <klimek@informatik.uni-leipzig.de>, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAHLDFnp__Tx2P1UpotkYSs_kq79igsuAEkqA1L5Jfv3PKpmxmA@mail.gmail.com>
Hi Fahad,

I would say that this is a concern that the SWRL representation may be
quite verbose, whereas a simple representation of the regex could be only a
few triples. I think this would be preferable and as long as the
implementation is simple I think it would be preferable to have a simple
representation.

Regards,
John

PS. Apologies can't make the call today

On Mon, 1 Jul 2019 at 18:11, Fahad Khan <anasfkhan81@gmail.com> wrote:

> Dear Christian,
> Yes it is a pain working with SWRL rules directly in turtle or XML, but
> there are ways of getting round this, for instance the interface in Protege
> that allows you to enter/view/edit the rules in the simpler horn clause
> like format.
> Fahad
>
> On Mon, 1 Jul 2019 at 17:05, Christian Chiarcos <
> chiarcos@informatik.uni-frankfurt.de> wrote:
>
>> Am .07.2019, 15:51 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:
>>
>> Dear Christian,
>>
>> I'm not proposing to parse strings using SWRL (as Wilcock tried to do;
>> and I understand why there wasn't much uptake for his idea at the time),
>> only to describe morphological patterns in a way that allows the generation
>> of morphological variants (word forms) of an entry (a far simpler task).
>> The rules for doing this in languages like Italian or French are fairly
>> straightforward, efficient enough to work on a reasonable sized lexicon (as
>> our work with the SIMPLE lexicon has shown), and the kinds of rules you get
>> (in this particular context) aren't really much harder to understand than
>> regular expressions, and in fact might even be easier to understand once
>> you have a basic grasp of how horn clauses are written in rule languages
>> (not that hard to come by). At the same time using SWRL ensures that we
>> produce human readable rules while remaining within the Semantic Web stack
>> and use pre existing technologies (SWRL functionality, including a
>> pre-installed rule enging is now bundled in as a feature with Protege).
>>
>> However, I have the strong suspicion that the modelling of morphological
>> patterns via SWRL rules (in order to generate forms) will not be viable for
>> all languages (Semitic languages for instance, though I haven't actually
>> tried to model this myself, since I'm not really compent enough to do so)
>> so I am not putting it forward as a general purpose method for representing
>> intensional morphological descriptions . In fact I don't think there is one
>> solution, one silver bullet, here (in the sense of being both descriptive
>> and machine actionable while allowing us to remain within the whole
>> semantic web ecosystem). However, I think that whatever we come up with
>> should be as compatible as possible with approaches like the SWRL one
>> (which does work with a lot of languages but maybe not all) while at the
>> same time leaving open the possibility of using other approaches such as
>> finite state transducers in a more expressive logic programming language.
>> The LMF way of doing this was interesting, they started by making a
>> distinction between extensional and intensional morphological descriptions,
>> and then came up with their own formalism to represent such patterns
>> (represented as strings), these could then be later translated into other
>> machine actionable formats.
>>
>>
>> Sure. My personal feeling is that regex with capturing groups might be a
>> means to achieve that to a large extent (for morphology). And if there was
>> some existing vocabulary to represent left-hand-sides and right-hand-sides
>> of regex-based transformations, it would be ideal for our purposes. SWRL
>> actually does that, but then look at the Turtle rendering of a simple
>> replacement rule:
>>
>> SWRL:
>> myFeat(?x, ?y) , replace(?y, "ak$", "a", ?z) -> myFeat(?x, ?z)
>> # this is nice, indeed
>>
>> TTL (as produced by Protege):
>>
>> [ rdf:type swrl:Imp ;
>>
>>   swrl:head [ rdf:type swrl:AtomList ;
>>
>>        rdf:rest rdf:nil ;
>>
>>        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
>>
>>              swrl:propertyPredicate :myFeat ;
>>
>>              swrl:argument1 :x ;
>>
>>              swrl:argument2 :z
>>
>>             ]
>>
>>       ] ;
>>
>>  swrl:body [ rdf:type swrl:AtomList ;
>>
>>        rdf:rest [ rdf:type swrl:AtomList ;
>>
>>              rdf:rest rdf:nil ;
>>
>>              rdf:first [ rdf:type swrl:BuiltinAtom ;
>>
>>                    swrl:builtin swrlb:replace ;
>>
>>                    swrl:arguments [ rdf:type rdf:List ;
>>
>>                            rdf:first :y ;
>>
>>                            rdf:rest [ rdf:type rdf:List ;
>>
>>                                  rdf:first "ak$" ;
>>
>>                                  rdf:rest [ rdf:type rdf:List ;
>>
>>                                       rdf:first "a" ;
>>
>>                                       rdf:rest ( :z
>>
>>                                            )
>>
>>                                      ]
>>
>>                                 ]
>>
>>                           ]
>>
>>                   ]
>>
>>             ] ;
>>
>>        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
>>
>>              swrl:propertyPredicate :myFeat ;
>>
>>              swrl:argument1 :x ;
>>
>>              swrl:argument2 :y
>>
>>             ]
>>
>>       ]
>> ] .
>>
>> We can get it a little bit more compact if we omit RDFS-inferrable
>> triples and rdf:nils:
>>
>> [ rdf:type swrl:Imp ;
>>
>>   swrl:head [ rdf:type swrl:AtomList ;
>>
>>        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
>>
>>              swrl:propertyPredicate :myFeat ;
>>
>>              swrl:argument1 :x ;
>>
>>              swrl:argument2 :z ] ] ;
>>
>>   swrl:body [ rdf:type swrl:AtomList ;
>>
>>        rdf:rest [ rdf:type swrl:AtomList ;
>>
>>              rdf:first [ rdf:type swrl:BuiltinAtom ;
>>
>>                    swrl:builtin swrlb:replace ;
>>
>>                    swrl:arguments [ rdf:first :y ;
>>
>>                            rdf:rest [ rdf:first "ak$" ;
>>
>>                                  rdf:rest [ rdf:first "a" ;
>>
>>                                       rdf:rest ( :z ) ] ] ] ] ] ;
>>
>>        rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
>>
>>              swrl:propertyPredicate :myFeat ;
>>
>>              swrl:argument1 :x ;
>>
>>              swrl:argument2 :y ] ] ] .
>>
>> But I see no way for further reduction.
>>
>>  I'm not too deep into SWRL, maybe there is a way to provide a more
>> readable rendering, but better don't let this particular fragment get
>> anywhere near your users. For the time being, all OntoLex examples have
>> been Turtle-based, and shifting between different levels of representation
>> (i.e., mixing SWRL and TTL) in the description will leave people highly
>> confused. I think the best we can aim for is a vocabulary that
>> approximately does the following
>>
>> [ a :ReplacementRule;
>>   :onProperty :myFeat;
>>   :lhs "ak$";
>>   :rhs "a" ]
>>
>>  This is much more restricted than SWRL, of course, but such a
>> mini-language can  be processed with SPARQL Update, e.g., to generate
>> proper SWRL (or anything else).
>>
>> Best,
>> Christian
>>
>>
>> Cheers,
>> Fahad
>>
>> On Mon, 1 Jul 2019 at 15:07, Christian Chiarcos <
>> christian.chiarcos@web.de> wrote:
>>
>>> Dear Fahad,
>>>
>>> thanks a lot for this update. In fact, it ties in quite neatly with
>>> other approaches on parsing with SWRL/RIF, e.g., Graham Wilcock HPSG
>>> parser. On the other hand, we should keep in mind that Wilcock basically
>>> failed (not in terms of expressivity or performance, but in terms of
>>> adaptation by the community) and he himself thus abandoned the idea. So,
>>> while we *should* mention that rules can be implemented in this way (in
>>> terms or SW technology, this is the "right" way of implementing rules), I
>>> don't think we should prescribe SWRL nor RIF.
>>>
>>> This is for two reasons:
>>>
>>> On a technological level, RIF is a high-level technology, operating on
>>> top of OWL, so its proper handling requires a lot of expertise by the user
>>> and is technically demanding. I'm not sure about the popularity of either
>>> RIF or OWL beyond the core Semantic Web community anymore, whereas plain
>>> RDF is relatively widely used.
>>>
>>> On a conceptual level, the dominating paradigm in morphology generation
>>> are finite state transducers, and these can be reduced to regular
>>> expressions, and as we have native support for regex in SPARQL Update, SWRL
>>> and most programming languages, this would be more generic and come with a
>>> lower entry barrier. But then, regular expressions must also not be the
>>> only way to populate a paradigm (resp., a particular inflection type), as
>>> many lexicographers and linguists will find this too technical and prefer
>>> to provide representative examples rather than concrete rules -- and our
>>> modelling should cover both uses.
>>>
>>> Just my 2ct,
>>> Christian
>>>
>>> PS: I see drawbacks of the regex idea, too, in particular in that it is
>>> string-based rather than concept-based.
>>>
>>> PPS: A compromise could be to use the swrlb:replace to write
>>> transformation rules with regular expressions. However, the SWRL
>>> serialization in Turtle is close to a nightmare (because its bindings are
>>> internally represented by lists), and we should probably use TTL for
>>> illustrative examples. I doubt we could convincingly sell this to anyone.
>>>
>>> Am .07.2019, 12:13 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:
>>>
>>> Hi Bettina, All,
>>> Here is the poster I presented at Euralex last year which I mentioned in
>>> the last telco and which describes the approach we took to modelling
>>> Italian morphology using SWRL:
>>>
>>> https://docs.google.com/presentation/d/1pHt8IG0ni5x9AkoPCsCCccRPEFIeObW7eR-PxY1JN7A/edit?usp=sharing
>>> Cheers,
>>> Fahad
>>>
>>> On Tue, 25 Jun 2019 at 12:12, Bettina Klimek <
>>> klimek@informatik.uni-leipzig.de> wrote:
>>>
>>>> Hi all,
>>>>
>>>> this is the link to the telco today at 1pm CEST:
>>>>
>>>> https://hangouts.google.com/call/UNgLuAFv3BfDfX7P5x8EAEEI
>>>>
>>>> We will continue to discuss the modelling of morphological patterns and
>>>> paradigms.
>>>>
>>>> Regards,
>>>>
>>>> Bettina
>>>>
>>>> --
>>>> Bettina Klimek
>>>> PhD Student
>>>> Department of Computer Science, University of Leipzig
>>>> Institute for Applied Informatics (InfAI)
>>>> Goerdelerring 9
>>>> 04109 Leipzig
>>>>
>>>> Research Group: http://aksw.org/Groups/KILT
>>>> Homepage: http://aksw.org/BettinaKlimek
>>>> Projects: http://mmoon.org, http://linguistics.okfn.org
>>>> Events:  12 -17 May 2019 "3rd Summer Datathon on Linguistic Linked Open
>>>> Data (SD-LLOD 2019)"
>>>>           https://datathon2019.linguistic-lod.org/
>>>>           20-22 May 2019 "LDK 2019 – 2nd Conference on Language, Data
>>>> and Knowledge"
>>>>           http://2019.ldk-conf.org/
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Prof. Dr. Christian Chiarcos
>> Applied Computational Linguistics
>> Johann Wolfgang Goethe Universität Frankfurt a. M.
>> 60054 Frankfurt am Main, Germany
>>
>> office: Robert-Mayer-Str. 11-15, #107
>> mail: chiarcos@informatik.uni-frankfurt.de
>> web: http://acoli.cs.uni-frankfurt.de
>> tel: +49-(0)69-798-22463
>> fax: +49-(0)69-798-28334
>>
>
Received on Tuesday, 2 July 2019 08:33:59 UTC