- From: John P. McCrae <john.mccrae@insight-centre.org>
- Date: Tue, 2 Jul 2019 09:33:23 +0100
- To: Fahad Khan <anasfkhan81@gmail.com>
- Cc: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>, Bettina Klimek <klimek@informatik.uni-leipzig.de>, public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAHLDFnp__Tx2P1UpotkYSs_kq79igsuAEkqA1L5Jfv3PKpmxmA@mail.gmail.com>
Hi Fahad, I would say that this is a concern that the SWRL representation may be quite verbose, whereas a simple representation of the regex could be only a few triples. I think this would be preferable and as long as the implementation is simple I think it would be preferable to have a simple representation. Regards, John PS. Apologies can't make the call today On Mon, 1 Jul 2019 at 18:11, Fahad Khan <anasfkhan81@gmail.com> wrote: > Dear Christian, > Yes it is a pain working with SWRL rules directly in turtle or XML, but > there are ways of getting round this, for instance the interface in Protege > that allows you to enter/view/edit the rules in the simpler horn clause > like format. > Fahad > > On Mon, 1 Jul 2019 at 17:05, Christian Chiarcos < > chiarcos@informatik.uni-frankfurt.de> wrote: > >> Am .07.2019, 15:51 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>: >> >> Dear Christian, >> >> I'm not proposing to parse strings using SWRL (as Wilcock tried to do; >> and I understand why there wasn't much uptake for his idea at the time), >> only to describe morphological patterns in a way that allows the generation >> of morphological variants (word forms) of an entry (a far simpler task). >> The rules for doing this in languages like Italian or French are fairly >> straightforward, efficient enough to work on a reasonable sized lexicon (as >> our work with the SIMPLE lexicon has shown), and the kinds of rules you get >> (in this particular context) aren't really much harder to understand than >> regular expressions, and in fact might even be easier to understand once >> you have a basic grasp of how horn clauses are written in rule languages >> (not that hard to come by). At the same time using SWRL ensures that we >> produce human readable rules while remaining within the Semantic Web stack >> and use pre existing technologies (SWRL functionality, including a >> pre-installed rule enging is now bundled in as a feature with Protege). >> >> However, I have the strong suspicion that the modelling of morphological >> patterns via SWRL rules (in order to generate forms) will not be viable for >> all languages (Semitic languages for instance, though I haven't actually >> tried to model this myself, since I'm not really compent enough to do so) >> so I am not putting it forward as a general purpose method for representing >> intensional morphological descriptions . In fact I don't think there is one >> solution, one silver bullet, here (in the sense of being both descriptive >> and machine actionable while allowing us to remain within the whole >> semantic web ecosystem). However, I think that whatever we come up with >> should be as compatible as possible with approaches like the SWRL one >> (which does work with a lot of languages but maybe not all) while at the >> same time leaving open the possibility of using other approaches such as >> finite state transducers in a more expressive logic programming language. >> The LMF way of doing this was interesting, they started by making a >> distinction between extensional and intensional morphological descriptions, >> and then came up with their own formalism to represent such patterns >> (represented as strings), these could then be later translated into other >> machine actionable formats. >> >> >> Sure. My personal feeling is that regex with capturing groups might be a >> means to achieve that to a large extent (for morphology). And if there was >> some existing vocabulary to represent left-hand-sides and right-hand-sides >> of regex-based transformations, it would be ideal for our purposes. SWRL >> actually does that, but then look at the Turtle rendering of a simple >> replacement rule: >> >> SWRL: >> myFeat(?x, ?y) , replace(?y, "ak$", "a", ?z) -> myFeat(?x, ?z) >> # this is nice, indeed >> >> TTL (as produced by Protege): >> >> [ rdf:type swrl:Imp ; >> >> swrl:head [ rdf:type swrl:AtomList ; >> >> rdf:rest rdf:nil ; >> >> rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ; >> >> swrl:propertyPredicate :myFeat ; >> >> swrl:argument1 :x ; >> >> swrl:argument2 :z >> >> ] >> >> ] ; >> >> swrl:body [ rdf:type swrl:AtomList ; >> >> rdf:rest [ rdf:type swrl:AtomList ; >> >> rdf:rest rdf:nil ; >> >> rdf:first [ rdf:type swrl:BuiltinAtom ; >> >> swrl:builtin swrlb:replace ; >> >> swrl:arguments [ rdf:type rdf:List ; >> >> rdf:first :y ; >> >> rdf:rest [ rdf:type rdf:List ; >> >> rdf:first "ak$" ; >> >> rdf:rest [ rdf:type rdf:List ; >> >> rdf:first "a" ; >> >> rdf:rest ( :z >> >> ) >> >> ] >> >> ] >> >> ] >> >> ] >> >> ] ; >> >> rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ; >> >> swrl:propertyPredicate :myFeat ; >> >> swrl:argument1 :x ; >> >> swrl:argument2 :y >> >> ] >> >> ] >> ] . >> >> We can get it a little bit more compact if we omit RDFS-inferrable >> triples and rdf:nils: >> >> [ rdf:type swrl:Imp ; >> >> swrl:head [ rdf:type swrl:AtomList ; >> >> rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ; >> >> swrl:propertyPredicate :myFeat ; >> >> swrl:argument1 :x ; >> >> swrl:argument2 :z ] ] ; >> >> swrl:body [ rdf:type swrl:AtomList ; >> >> rdf:rest [ rdf:type swrl:AtomList ; >> >> rdf:first [ rdf:type swrl:BuiltinAtom ; >> >> swrl:builtin swrlb:replace ; >> >> swrl:arguments [ rdf:first :y ; >> >> rdf:rest [ rdf:first "ak$" ; >> >> rdf:rest [ rdf:first "a" ; >> >> rdf:rest ( :z ) ] ] ] ] ] ; >> >> rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ; >> >> swrl:propertyPredicate :myFeat ; >> >> swrl:argument1 :x ; >> >> swrl:argument2 :y ] ] ] . >> >> But I see no way for further reduction. >> >> I'm not too deep into SWRL, maybe there is a way to provide a more >> readable rendering, but better don't let this particular fragment get >> anywhere near your users. For the time being, all OntoLex examples have >> been Turtle-based, and shifting between different levels of representation >> (i.e., mixing SWRL and TTL) in the description will leave people highly >> confused. I think the best we can aim for is a vocabulary that >> approximately does the following >> >> [ a :ReplacementRule; >> :onProperty :myFeat; >> :lhs "ak$"; >> :rhs "a" ] >> >> This is much more restricted than SWRL, of course, but such a >> mini-language can be processed with SPARQL Update, e.g., to generate >> proper SWRL (or anything else). >> >> Best, >> Christian >> >> >> Cheers, >> Fahad >> >> On Mon, 1 Jul 2019 at 15:07, Christian Chiarcos < >> christian.chiarcos@web.de> wrote: >> >>> Dear Fahad, >>> >>> thanks a lot for this update. In fact, it ties in quite neatly with >>> other approaches on parsing with SWRL/RIF, e.g., Graham Wilcock HPSG >>> parser. On the other hand, we should keep in mind that Wilcock basically >>> failed (not in terms of expressivity or performance, but in terms of >>> adaptation by the community) and he himself thus abandoned the idea. So, >>> while we *should* mention that rules can be implemented in this way (in >>> terms or SW technology, this is the "right" way of implementing rules), I >>> don't think we should prescribe SWRL nor RIF. >>> >>> This is for two reasons: >>> >>> On a technological level, RIF is a high-level technology, operating on >>> top of OWL, so its proper handling requires a lot of expertise by the user >>> and is technically demanding. I'm not sure about the popularity of either >>> RIF or OWL beyond the core Semantic Web community anymore, whereas plain >>> RDF is relatively widely used. >>> >>> On a conceptual level, the dominating paradigm in morphology generation >>> are finite state transducers, and these can be reduced to regular >>> expressions, and as we have native support for regex in SPARQL Update, SWRL >>> and most programming languages, this would be more generic and come with a >>> lower entry barrier. But then, regular expressions must also not be the >>> only way to populate a paradigm (resp., a particular inflection type), as >>> many lexicographers and linguists will find this too technical and prefer >>> to provide representative examples rather than concrete rules -- and our >>> modelling should cover both uses. >>> >>> Just my 2ct, >>> Christian >>> >>> PS: I see drawbacks of the regex idea, too, in particular in that it is >>> string-based rather than concept-based. >>> >>> PPS: A compromise could be to use the swrlb:replace to write >>> transformation rules with regular expressions. However, the SWRL >>> serialization in Turtle is close to a nightmare (because its bindings are >>> internally represented by lists), and we should probably use TTL for >>> illustrative examples. I doubt we could convincingly sell this to anyone. >>> >>> Am .07.2019, 12:13 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>: >>> >>> Hi Bettina, All, >>> Here is the poster I presented at Euralex last year which I mentioned in >>> the last telco and which describes the approach we took to modelling >>> Italian morphology using SWRL: >>> >>> https://docs.google.com/presentation/d/1pHt8IG0ni5x9AkoPCsCCccRPEFIeObW7eR-PxY1JN7A/edit?usp=sharing >>> Cheers, >>> Fahad >>> >>> On Tue, 25 Jun 2019 at 12:12, Bettina Klimek < >>> klimek@informatik.uni-leipzig.de> wrote: >>> >>>> Hi all, >>>> >>>> this is the link to the telco today at 1pm CEST: >>>> >>>> https://hangouts.google.com/call/UNgLuAFv3BfDfX7P5x8EAEEI >>>> >>>> We will continue to discuss the modelling of morphological patterns and >>>> paradigms. >>>> >>>> Regards, >>>> >>>> Bettina >>>> >>>> -- >>>> Bettina Klimek >>>> PhD Student >>>> Department of Computer Science, University of Leipzig >>>> Institute for Applied Informatics (InfAI) >>>> Goerdelerring 9 >>>> 04109 Leipzig >>>> >>>> Research Group: http://aksw.org/Groups/KILT >>>> Homepage: http://aksw.org/BettinaKlimek >>>> Projects: http://mmoon.org, http://linguistics.okfn.org >>>> Events: 12 -17 May 2019 "3rd Summer Datathon on Linguistic Linked Open >>>> Data (SD-LLOD 2019)" >>>> https://datathon2019.linguistic-lod.org/ >>>> 20-22 May 2019 "LDK 2019 – 2nd Conference on Language, Data >>>> and Knowledge" >>>> http://2019.ldk-conf.org/ >>>> >>>> >>>> >>> >>> >>> >> >> >> -- >> Prof. Dr. Christian Chiarcos >> Applied Computational Linguistics >> Johann Wolfgang Goethe Universität Frankfurt a. M. >> 60054 Frankfurt am Main, Germany >> >> office: Robert-Mayer-Str. 11-15, #107 >> mail: chiarcos@informatik.uni-frankfurt.de >> web: http://acoli.cs.uni-frankfurt.de >> tel: +49-(0)69-798-22463 >> fax: +49-(0)69-798-28334 >> >
Received on Tuesday, 2 July 2019 08:33:59 UTC