- From: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
- Date: Mon, 01 Jul 2019 17:05:20 +0200
- To: "Fahad Khan" <anasfkhan81@gmail.com>
- Cc: klimek@informatik.uni-leipzig.de, public-ontolex <public-ontolex@w3.org>
- Message-ID: <op.z38766rg89jat0@kitaba>
Am .07.2019, 15:51 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:
> Dear Christian,
> I'm not proposing to parse strings using SWRL (as Wilcock tried to do;
> and I understand why there wasn't much uptake for his idea at the time),
> only to >describe morphological patterns in a way that allows the
> generation of morphological variants (word forms) of an entry (a far
> simpler task). The rules for >doing this in languages like Italian or
> French are fairly straightforward, efficient enough to work on a
> reasonable sized lexicon (as our work with the >SIMPLE lexicon has
> shown), and the kinds of rules you get (in this particular context)
> aren't really much harder to understand than regular expressions, >and
> in fact might even be easier to understand once you have a basic grasp
> of how horn clauses are written in rule languages (not that hard to come
> by). >At the same time using SWRL ensures that we produce human readable
> rules while remaining within the Semantic Web stack and use pre existing
> >technologies (SWRL functionality, including a pre-installed rule enging
> is now bundled in as a feature with Protege).
> However, I have the strong suspicion that the modelling of morphological
> patterns via SWRL rules (in order to generate forms) will not be viable
> for all >languages (Semitic languages for instance, though I haven't
> actually tried to model this myself, since I'm not really compent enough
> to do so) so I am not >putting it forward as a general purpose method
> for representing intensional morphological descriptions . In fact I
> don't think there is one solution, one >silver bullet, here (in the
> sense of being both descriptive and machine actionable while allowing us
> to remain within the whole semantic web ecosystem). >However, I think
> that whatever we come up with should be as compatible as possible with
> approaches like the SWRL one (which does work with a lot of >languages
> but maybe not all) while at the same time leaving open the possibility
> of using other approaches such as finite state transducers in a more
> >expressive logic programming language. The LMF way of doing this was
> interesting, they started by making a distinction between extensional
> and >intensional morphological descriptions, and then came up with their
> own formalism to represent such patterns (represented as strings), these
> could then >be later translated into other machine actionable formats.
Sure. My personal feeling is that regex with capturing groups might be a
means to achieve that to a large extent (for morphology). And if there was
some existing vocabulary to represent left-hand-sides and right-hand-sides
of regex-based transformations, it would be ideal for our purposes. SWRL
actually does that, but then look at the Turtle rendering of a simple
replacement rule:
SWRL:
myFeat(?x, ?y) , replace(?y, "ak$", "a", ?z) -> myFeat(?x, ?z)
# this is nice, indeed
TTL (as produced by Protege):
[ rdf:type swrl:Imp ;
swrl:head [ rdf:type swrl:AtomList ;
rdf:rest rdf:nil ;
rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
swrl:propertyPredicate :myFeat ;
swrl:argument1 :x ;
swrl:argument2 :z
]
] ;
swrl:body [ rdf:type swrl:AtomList ;
rdf:rest [ rdf:type swrl:AtomList ;
rdf:rest rdf:nil ;
rdf:first [ rdf:type swrl:BuiltinAtom ;
swrl:builtin swrlb:replace ;
swrl:arguments [ rdf:type rdf:List ;
rdf:first :y ;
rdf:rest [ rdf:type rdf:List ;
rdf:first "ak$" ;
rdf:rest [ rdf:type rdf:List ;
rdf:first "a" ;
rdf:rest ( :z
)
]
]
]
]
] ;
rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
swrl:propertyPredicate :myFeat ;
swrl:argument1 :x ;
swrl:argument2 :y
]
]
] .
We can get it a little bit more compact if we omit RDFS-inferrable triples
and rdf:nils:
[ rdf:type swrl:Imp ;
swrl:head [ rdf:type swrl:AtomList ;
rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
swrl:propertyPredicate :myFeat ;
swrl:argument1 :x ;
swrl:argument2 :z ] ] ;
swrl:body [ rdf:type swrl:AtomList ;
rdf:rest [ rdf:type swrl:AtomList ;
rdf:first [ rdf:type swrl:BuiltinAtom ;
swrl:builtin swrlb:replace ;
swrl:arguments [ rdf:first :y ;
rdf:rest [ rdf:first "ak$" ;
rdf:rest [ rdf:first "a" ;
rdf:rest ( :z ) ] ] ] ] ] ;
rdf:first [ rdf:type swrl:DatavaluedPropertyAtom ;
swrl:propertyPredicate :myFeat ;
swrl:argument1 :x ;
swrl:argument2 :y ] ] ] .
But I see no way for further reduction.
I'm not too deep into SWRL, maybe there is a way to provide a more
readable rendering, but better don't let this particular fragment get
anywhere near your users. For the time being, all OntoLex examples have
been Turtle-based, and shifting between different levels of representation
(i.e., mixing SWRL and TTL) in the description will leave people highly
confused. I think the best we can aim for is a vocabulary that
approximately does the following
[ a :ReplacementRule;
:onProperty :myFeat;
:lhs "ak$";
:rhs "a" ]
This is much more restricted than SWRL, of course, but such a
mini-language can be processed with SPARQL Update, e.g., to generate
proper SWRL (or anything else).
Best,
Christian
>
> Cheers,Fahad
> On Mon, 1 Jul 2019 at 15:07, Christian Chiarcos
> <christian.chiarcos@web.de> wrote:
>> Dear Fahad,
>>
>> thanks a lot for this update. In fact, it ties in quite neatly with
>> other approaches on parsing with SWRL/RIF, e.g., Graham Wilcock HPSG
>> parser. On the >>other hand, we should keep in mind that Wilcock
>> basically failed (not in terms of expressivity or performance, but in
>> terms of adaptation by the >>community) and he himself thus abandoned
>> the idea. So, while we *should* mention that rules can be implemented
>> in this way (in terms or SW >>technology, this is the "right" way of
>> implementing rules), I don't think we should prescribe SWRL nor RIF.
>> This is for two reasons:
>> On a technological level, RIF is a high-level technology, operating on
>> top of OWL, so its proper handling requires a lot of expertise by the
>> user and is >>technically demanding. I'm not sure about the popularity
>> of either RIF or OWL beyond the core Semantic Web community anymore,
>> whereas plain RDF >>is relatively widely used.
>>
>> On a conceptual level, the dominating paradigm in morphology generation
>> are finite state transducers, and these can be reduced to regular
>> expressions, >>and as we have native support for regex in SPARQL
>> Update, SWRL and most programming languages, this would be more generic
>> and come with a >>lower entry barrier. But then, regular expressions
>> must also not be the only way to populate a paradigm (resp., a
>> particular inflection type), as many >>lexicographers and linguists
>> will find this too technical and prefer to provide representative
>> examples rather than concrete rules -- and our modelling >>should cover
>> both uses.
>>
>> Just my 2ct,
>> Christian
>>
>> PS: I see drawbacks of the regex idea, too, in particular in that it is
>> string-based rather than concept-based.
>>
>> PPS: A compromise could be to use the swrlb:replace to write
>> transformation rules with regular expressions. However, the SWRL
>> serialization in Turtle >>is close to a nightmare (because its bindings
>> are internally represented by lists), and we should probably use TTL
>> for illustrative examples. I doubt we >>could convincingly sell this to
>> anyone.
>>
>> Am .07.2019, 12:13 Uhr, schrieb Fahad Khan <anasfkhan81@gmail.com>:
>>
>>> Hi Bettina, All,Here is the poster I presented at Euralex last year
>>> which I mentioned in the last telco and which describes the approach
>>> we took to modelling Italian >>>morphology using SWRL:
>>> https://docs.google.com/presentation/d/1pHt8IG0ni5x9AkoPCsCCccRPEFIeObW7eR-PxY1JN7A/edit?usp=sharing
>>> Cheers,Fahad
>>>
>>> On Tue, 25 Jun 2019 at 12:12, Bettina Klimek
>>> <klimek@informatik.uni-leipzig.de> wrote:
>>>> Hi all,
>>>>
>>>> this is the link to the telco today at 1pm CEST:
>>>>
>>>> https://hangouts.google.com/call/UNgLuAFv3BfDfX7P5x8EAEEI
>>>>
>>>> We will continue to discuss the modelling of morphological patterns
>>>> andparadigms.
>>>>
>>>> Regards,
>>>>
>>>> Bettina
>>>>
>>>> --Bettina Klimek
>>>> PhD Student
>>>> Department of Computer Science, University of Leipzig
>>>> Institute for Applied Informatics (InfAI)
>>>> Goerdelerring 9
>>>> 04109 Leipzig
>>>>
>>>> Research Group: http://aksw.org/Groups/KILT
>>>> Homepage: http://aksw.org/BettinaKlimek
>>>> Projects: http://mmoon.org, http://linguistics.okfn.org
>>>> Events: 12 -17 May 2019 "3rd Summer Datathon on Linguistic Linked
>>>> Open Data (SD-LLOD 2019)"
>>>> https://datathon2019.linguistic-lod.org/
>>>> 20-22 May 2019 "LDK 2019 – 2nd Conference on Language, Data
>>>> and Knowledge"
>>>> http://2019.ldk-conf.org/
>>>>
>>>>
>>
>>
>>
--
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany
office: Robert-Mayer-Str. 11-15, #107
mail: chiarcos@informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28334
Received on Monday, 1 July 2019 15:05:53 UTC