- From: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
- Date: Mon, 01 Jul 2019 17:05:20 +0200
- To: "Fahad Khan" <anasfkhan81@gmail.com>
- Cc: klimek@informatik.uni-leipzig.de, public-ontolex <public-ontolex@w3.org>
- Message-ID: <op.z38766rg89jat0@kitaba>
Am .07.2019, 15:51 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>: > Dear Christian, > I'm not proposing to parse strings using SWRL (as Wilcock tried to do; > and I understand why there wasn't much uptake for his idea at the time), > only to >describe morphological patterns in a way that allows the > generation of morphological variants (word forms) of an entry (a far > simpler task). The rules for >doing this in languages like Italian or > French are fairly straightforward, efficient enough to work on a > reasonable sized lexicon (as our work with the >SIMPLE lexicon has > shown), and the kinds of rules you get (in this particular context) > aren't really much harder to understand than regular expressions, >and > in fact might even be easier to understand once you have a basic grasp > of how horn clauses are written in rule languages (not that hard to come > by). >At the same time using SWRL ensures that we produce human readable > rules while remaining within the Semantic Web stack and use pre existing > >technologies (SWRL functionality, including a pre-installed rule enging > is now bundled in as a feature with Protege). > However, I have the strong suspicion that the modelling of morphological > patterns via SWRL rules (in order to generate forms) will not be viable > for all >languages (Semitic languages for instance, though I haven't > actually tried to model this myself, since I'm not really compent enough > to do so) so I am not >putting it forward as a general purpose method > for representing intensional morphological descriptions . In fact I > don't think there is one solution, one >silver bullet, here (in the > sense of being both descriptive and machine actionable while allowing us > to remain within the whole semantic web ecosystem). >However, I think > that whatever we come up with should be as compatible as possible with > approaches like the SWRL one (which does work with a lot of >languages > but maybe not all) while at the same time leaving open the possibility > of using other approaches such as finite state transducers in a more > >expressive logic programming language. The LMF way of doing this was > interesting, they started by making a distinction between extensional > and >intensional morphological descriptions, and then came up with their > own formalism to represent such patterns (represented as strings), these > could then >be later translated into other machine actionable formats. Sure. My personal feeling is that regex with capturing groups might be a means to achieve that to a large extent (for morphology). And if there was some existing vocabulary to represent left-hand-sides and right-hand-sides of regex-based transformations, it would be ideal for our purposes. SWRL actually does that, but then look at the Turtle rendering of a simple replacement rule: SWRL: myFeat(?x, ?y) , replace(?y, "ak$", "a", ?z) -> myFeat(?x, ?z) # this is nice, indeed TTL (as produced by Protege): [ rdf:type swrl:Imp ; swrl:head [ rdf:type swrl:AtomList ; rdf:rest rdf:nil ; rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ; swrl:propertyPredicate :myFeat ; swrl:argument1 :x ; swrl:argument2 :z ] ] ; swrl:body [ rdf:type swrl:AtomList ; rdf:rest [ rdf:type swrl:AtomList ; rdf:rest rdf:nil ; rdf:first [ rdf:type swrl:BuiltinAtom ; swrl:builtin swrlb:replace ; swrl:arguments [ rdf:type rdf:List ; rdf:first :y ; rdf:rest [ rdf:type rdf:List ; rdf:first "ak$" ; rdf:rest [ rdf:type rdf:List ; rdf:first "a" ; rdf:rest ( :z ) ] ] ] ] ] ; rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ; swrl:propertyPredicate :myFeat ; swrl:argument1 :x ; swrl:argument2 :y ] ] ] . We can get it a little bit more compact if we omit RDFS-inferrable triples and rdf:nils: [ rdf:type swrl:Imp ; swrl:head [ rdf:type swrl:AtomList ; rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ; swrl:propertyPredicate :myFeat ; swrl:argument1 :x ; swrl:argument2 :z ] ] ; swrl:body [ rdf:type swrl:AtomList ; rdf:rest [ rdf:type swrl:AtomList ; rdf:first [ rdf:type swrl:BuiltinAtom ; swrl:builtin swrlb:replace ; swrl:arguments [ rdf:first :y ; rdf:rest [ rdf:first "ak$" ; rdf:rest [ rdf:first "a" ; rdf:rest ( :z ) ] ] ] ] ] ; rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ; swrl:propertyPredicate :myFeat ; swrl:argument1 :x ; swrl:argument2 :y ] ] ] . But I see no way for further reduction. I'm not too deep into SWRL, maybe there is a way to provide a more readable rendering, but better don't let this particular fragment get anywhere near your users. For the time being, all OntoLex examples have been Turtle-based, and shifting between different levels of representation (i.e., mixing SWRL and TTL) in the description will leave people highly confused. I think the best we can aim for is a vocabulary that approximately does the following [ a :ReplacementRule; :onProperty :myFeat; :lhs "ak$"; :rhs "a" ] This is much more restricted than SWRL, of course, but such a mini-language can be processed with SPARQL Update, e.g., to generate proper SWRL (or anything else). Best, Christian > > Cheers,Fahad > On Mon, 1 Jul 2019 at 15:07, Christian Chiarcos > <christian.chiarcos@web.de> wrote: >> Dear Fahad, >> >> thanks a lot for this update. In fact, it ties in quite neatly with >> other approaches on parsing with SWRL/RIF, e.g., Graham Wilcock HPSG >> parser. On the >>other hand, we should keep in mind that Wilcock >> basically failed (not in terms of expressivity or performance, but in >> terms of adaptation by the >>community) and he himself thus abandoned >> the idea. So, while we *should* mention that rules can be implemented >> in this way (in terms or SW >>technology, this is the "right" way of >> implementing rules), I don't think we should prescribe SWRL nor RIF. >> This is for two reasons: >> On a technological level, RIF is a high-level technology, operating on >> top of OWL, so its proper handling requires a lot of expertise by the >> user and is >>technically demanding. I'm not sure about the popularity >> of either RIF or OWL beyond the core Semantic Web community anymore, >> whereas plain RDF >>is relatively widely used. >> >> On a conceptual level, the dominating paradigm in morphology generation >> are finite state transducers, and these can be reduced to regular >> expressions, >>and as we have native support for regex in SPARQL >> Update, SWRL and most programming languages, this would be more generic >> and come with a >>lower entry barrier. But then, regular expressions >> must also not be the only way to populate a paradigm (resp., a >> particular inflection type), as many >>lexicographers and linguists >> will find this too technical and prefer to provide representative >> examples rather than concrete rules -- and our modelling >>should cover >> both uses. >> >> Just my 2ct, >> Christian >> >> PS: I see drawbacks of the regex idea, too, in particular in that it is >> string-based rather than concept-based. >> >> PPS: A compromise could be to use the swrlb:replace to write >> transformation rules with regular expressions. However, the SWRL >> serialization in Turtle >>is close to a nightmare (because its bindings >> are internally represented by lists), and we should probably use TTL >> for illustrative examples. I doubt we >>could convincingly sell this to >> anyone. >> >> Am .07.2019, 12:13 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>: >> >>> Hi Bettina, All,Here is the poster I presented at Euralex last year >>> which I mentioned in the last telco and which describes the approach >>> we took to modelling Italian >>>morphology using SWRL: >>> https://docs.google.com/presentation/d/1pHt8IG0ni5x9AkoPCsCCccRPEFIeObW7eR-PxY1JN7A/edit?usp=sharing >>> Cheers,Fahad >>> >>> On Tue, 25 Jun 2019 at 12:12, Bettina Klimek >>> <klimek@informatik.uni-leipzig.de> wrote: >>>> Hi all, >>>> >>>> this is the link to the telco today at 1pm CEST: >>>> >>>> https://hangouts.google.com/call/UNgLuAFv3BfDfX7P5x8EAEEI >>>> >>>> We will continue to discuss the modelling of morphological patterns >>>> andparadigms. >>>> >>>> Regards, >>>> >>>> Bettina >>>> >>>> --Bettina Klimek >>>> PhD Student >>>> Department of Computer Science, University of Leipzig >>>> Institute for Applied Informatics (InfAI) >>>> Goerdelerring 9 >>>> 04109 Leipzig >>>> >>>> Research Group: http://aksw.org/Groups/KILT >>>> Homepage: http://aksw.org/BettinaKlimek >>>> Projects: http://mmoon.org, http://linguistics.okfn.org >>>> Events: 12 -17 May 2019 "3rd Summer Datathon on Linguistic Linked >>>> Open Data (SD-LLOD 2019)" >>>> https://datathon2019.linguistic-lod.org/ >>>> 20-22 May 2019 "LDK 2019 – 2nd Conference on Language, Data >>>> and Knowledge" >>>> http://2019.ldk-conf.org/ >>>> >>>> >> >> >> -- Prof. Dr. Christian Chiarcos Applied Computational Linguistics Johann Wolfgang Goethe Universität Frankfurt a. M. 60054 Frankfurt am Main, Germany office: Robert-Mayer-Str. 11-15, #107 mail: chiarcos@informatik.uni-frankfurt.de web: http://acoli.cs.uni-frankfurt.de tel: +49-(0)69-798-22463 fax: +49-(0)69-798-28334
Received on Monday, 1 July 2019 15:05:53 UTC