Re: FRaC Faliscan language Example from Christian Chiarcos on 2021-03-07 (public-ontolex@w3.org from March 2021)

From: Christian Chiarcos <christian.chiarcos@web.de>
Date: Sun, 7 Mar 2021 08:35:39 +0100
To: Fahad Khan <anasfkhan81@gmail.com>
Cc: public-ontolex <public-ontolex@w3.org>, Valeria Quochi <vquochi@gmail.com>
Message-ID: <CAC1YGdhkOdnQeK4STMG-ASeUg92VfyYm7kiRJPFCmanzk+JFqg@mail.gmail.com>
Hi Fahad, dear all,

Am Mo., 8. März 2021 um 12:17 Uhr schrieb Fahad Khan <anasfkhan81@gmail.com
>:

> Hi Everyone,
> I have been working on modelling an entry from a lexicon currently being
> compiled as part of an Italian project on Italic languages and I think it
> potentially shows some limitations in the current ontolex/FRaC approach.  I
> would like to discuss this at the next telco but I will give a description
> here in order to get some feedback from the list too.
>
> In the example in question we have a Faliscan word, ekupetaris, which has
> different attested representations for the same form (or same morphological
> variant). That is, the masculine, nominative, singular form has been
> attested in the following written variants:  "ECVPETARIS", "EQUPETARS",
> "ekupetaris", "ekvopetaris", "ekvopetars", "epetaris", "eppetaris".  Each
> of these written variants has at least one attestation in some inscription.
> In the case of "ekupetaris" there are four different attestations; the
> others have one apiece.
>
> According to the ontolex-lemon model these are all written representations
> of the same Form element (the masculine, nominative, singular form of the
> noun).
>

You seem to assume that the same features for the same lexical
representation lead to exactly one Form. I don't think this is required. In
fact, we can have different forms with identical features but differences
in usage. Think of English "has" and "hath", which probably should be two
forms. Despite both being 3.sg.ind.prs, they are not interchangeable.
Looking at your examples, these forms also differ *phonologically*, not
just orthographically. There are at least five phonologically
differentiable forms here:

"ECVPETARIS", "ekupetaris",
"EQUPETARS",
"ekvopetaris",
"ekvopetars",
"epetaris", "eppetaris"

Everything else is just orthography. If your resource *decides* to define
forms as phonologically-based (this is not required), these would probably
be it.
However, this is pre-standardized writing, and you could go as far as to
distinguish every attested form simply because you can *never* be certain
whether there really are no phonological differences (epe- vs. eppe- may be
a difference, for example).

This approach would give us something like (elipsis added for readability):
>
> :ekupetaris a ontolex:Form ;
>     lexinfo:case lexinfo:nominativeCase ;lexinfo:gender lexinfo:masculine
> ; lexinfo:number lexinfo:singular ;
>     frac:attestation :att_0, :att_1, :att_2, :att_3,..., :att_9 ;
> ontolex:writtenRep "ECVPETARIS"@xfa, "EQUPETARS"@xfa, ... "eppetaris"@xfa .
>
>
> In other words (pardon the pun) we would lose the link between each
> written representation and its attestations.  We could recuperate this (to
> an extent) by making the written representation the value of the FRaC
> quotation property for each attestation, e.g., (for the first and sixth
> attestations)
>
> :att_0 a frac:Attestation ;
>     frac:attestationGloss "Pa2 lines 2-3, Certainty: certain,
> Bibliography: Pellegrini-Prosdocimi 1967, pp. 328-331" ;
>     frac:quotation "ekupetaris" .
>
>
I personally see no issue with that. The implicit assumption in this
modelling would be that the form variation is basically orthographical.
This is exactly what frac:quotation is there for, and there are other forms
of orthographic variation that aren't captured either in the writtenRep
(e.g., clitics). Modelling it in this way would mean that your form is
something abstract (whether you make it explicit with a transcription or
not), i.e., a continuum of evolving forms from a particular base form,
*ekwopetaris (> *ekwəpətar(ə)s  > *epəpətar(ə)s > *epətar(ə)s or the like
-- I made the glossing up, sorry for that -- is that "four horses"
[quadriga?] in Faliscan?).

As far as I see, OntoLex(-FrAC) allows you to *either* distinguish
different graphic representations as different forms (if that's important
for the data provider) and with their respective attestations -- that would
be the epigraphical approach --, *or* to lump them together in one form and
then indirectly cross-reference them by giving their respective full forms
-- that would be the etymological  approach --, *or* try do assert
different phonological forms, each with multiple writtenReps (as I tried
above, this is the phonological approach). Everything beyond that would be
beyond OntoLex-FrAC. You *could*, however, think about introducing
identifiers for different orthographical systems and then cross-reference
them (however, writtenRep is a datatype property and cannot take further
arguments that your orthography identifier could resolve against).

The difficulty I see is how to create a resource that is *both*
epigraphical and etymological (or phonological) at the same time. But on
the other hand, I don't see how something like that would look in a
dictionary. If you have such data, please share it, and we can consider to
take this as a basis for an alternative modelling.

This feels unsatisfactory to me for several reason (though it might not to
> others): not least because we might want to associate other information to
> the variant written representation (e.g., a certain written representation
> might have been used for a certain period or in a certain geographical
> region and this isn't always possible to specify with a language tag).
>

The only place to keep that information would be in a form. writtenRep is a
datatype property. No metadata!


> Two additional possibilities that come to mind here are creating different
> Forms for each of the written representations (forms with the same
> morphological feature but with a different writtenRep value and different
> attestations) and then using the sameAs property to say they're the same
> Form.
>

No, owl:sameAs would conflate them (or, reasoners would). We would need
another property. In principle, the right place for such a property would
be vartrans, and it should be a LexicalRelation between different forms.
There is no such thing, but we would generally need it for historical data,
e.g., etymological dictionaries. I would prefer this very much over
inventing something complicated within FrAC. Also because it would make
vartrans more symmetric, i.e., just applicable to every core element.

Until we have that, we can resort to skos:related to indicate that
different forms with the same etymology (and, possibly, phonology -- we
don't know that for sure, though) have more in common that just the same
grammatical features.

Using that, we can *actually* provide etymological and epigraphical
analyses at the same time, using the reconstructed "underlying" form as the
one from which the other related ones depart, and possibly give it some
special property to mark it as reconstructed (which would be outside FrAC,
as well -- maybe within morphology or lexinfo?).


> Another possibility could be the creation of a new class (in FRaC),
> something like AttestedRepresentation which is also a FRaC observable with
> associated properties attestedRep stringValue such that writtenRep is
> equivalent to attestedRep o stringValue.
>

I would rather avoid that. For many reasons: I'm not sure we can axiomatize
the values of datatype properties in this way. It would create something
nearly identical with Attestation, leading to a lot of confusion among
users of FrAC. If this is an observable, this would mean that it can have
Attestations on its own right -- what is an Attestation of an
AttestationRep? It would introduce at least two new properties and one new
class (as opposed to just one reifiable vartrans property that uses the
same construction template as we previously used for lexical relations),
and it would be *highly specific* for a use case relevant for epigraphy --
but not much beyond that (I might be wrong on that one). For a vartrans
relation between forms, I can see other uses (e.g., systematic mappings
between related forms of different lexemes, e.g., from different
languages). For the attestationRepresentation, I'm not sure these do exist.

Best,
Christian

>
Received on Monday, 8 March 2021 15:54:39 UTC