- From: Sander Stolk <ssstolk@gmail.com>
- Date: Wed, 23 Apr 2025 17:28:32 +0200
- To: Fahad Khan <anasfkhan81@gmail.com>
- Cc: "John P. McCrae" <john.mccrae@insight-centre.org>, public-ontolex@w3.org
- Message-ID: <CAJurLCwcgWq1XDHKaXd-H9v_u7A_J5d2dVdELeYFeq_UKV_49Q@mail.gmail.com>
Dear all,
Apologies for having remained out of touch over the last year and a half or
so. I have been severely ill and am just starting to recover. It is only by
chance that I noticed the call in time and am, finally, also able to summon
the energy to make these observations and to write them down. You will find
my comments below, both on the model and on smaller editorial matters
concerning the documentation. I hope these are helpful and, more
importantly in my eyes, I hope that you are all doing well. Know that I
have very much enjoyed working together with you on all aspects concerning
linguistic linked data, including on FrAC, and would welcome doing so again
in the future if possible. It is great to see the specification has gotten
to this stage; a significant development and a very welcome ontology indeed!
All best wishes,
Sander
---
*modelling:*
- Consider rephrasing "frequency" to "count". In many pieces of
software, including linguistic software but also in software aimed at
everyday users, the term "count" is used (e.g., "word count") instead of
"(absolute) frequency". Using this term would avoid ambiguity between
absolute and relative frequency in terms of what is denoted by the property
and class in the ontology.
- Currently the ontology requires min 1 dct:description for
Observations. Is that necessary? I imagine that capturing a description is
not always needed for captured frequencies.
- I am unsure whether the property "total" is needed. The model already
states that an Observation, such as a Frequency, will indicate wherein the
observation was made (e.g., a corpus). Thus, there is already a link
between Frequency and the anyURI/corpus that can be used. Moreover, *any*
absolute frequency is a total of sorts: all nouns in a corpus, all weak
verbs, etc. The property 'total' therefore can feel both underused and
overused, in terms of in what contexts to apply it, and it seems moreso to
have been added to the specification as a way to distinguish matters for
those publishing a corpus and those utilizing that corpus for observations?
The descriptions in the specification do not clearly distinguish the two
modelling methods, as it emphasizes "elements" being counted, regardless of
whether "total" is used or the other modelling pattern. Quotes below to
illustrate:
- definition of total: "The object property total assigns any
potential FrAC data source [...] the total number of elements that it
contains as a frac:Frequency object."
- in introduction: "OntoLex-Lemon provides a core vocabulary to
represent linguistic information associated with ontology and vocabulary
elements".
- elsewhere: "Elements [ontolex:LexicalEntry etc] that FrAC
properties apply to must be observable in a corpus or another linguistic
data source."
In other words, the specification itself currently does not yet do a very
good job at distinguishing why there is a distinction between 'total' and
'other', considering they are all deemed "(linguistic) elements".
Perhaps, instead of the pattern found at example 3, which is:
<https://wordnetcode.princeton.edu/glosstag.shtml> a dct:Collection ;
frac:total [
a frac:Frequency ;
rdf:value 1634691 ;
frac:unit "tokens"
] .
the following could be used just as well? If so, it would be possible to
discard the "total" property and do away with any ambiguity of when to use
which modelling pattern for such quantifications.
[ a frac:Frequency ;
rdf:value 1634691 ;
frac:unit "tokens"
] frac:observedIn <https://wordnetcode.princeton.edu/glosstag.shtml> .
<https://wordnetcode.princeton.edu/glosstag.shtml> a dct:Collection .
*document/editorial:*
- schema image (Figure 1) shows rdf:value property for Observation but also
for subclasses Attestation and Frequency but not Collocation. Moreover,
unlike rdf:value property, dct:description at Observation is not repeated
for any of its subclasses.
- schema image (Figure 1) does not show property "unit"
- documentation states rdfs:range of property "unit" is frac:Frequency,
although I believe it should be the domain rather than the range?
- "If a future community standard provides reference URIs for such
datatypes, frac:unit should be used as a datatype property." Replace
datatype with object here.
- I recommend doing away with syntax highlighting of Turtle snippets if the
highlighting is not intended for Turtle. As it is, sometimes @prefix is
highlighted, sometimes it is not. Only parts of URIs are highlighted.
Literals are sometimes highlighted, sometimes not. And so on. If
highlighting is meaningless and/or seemingly inconsistent, then it is best
avoided.
- perhaps good to add something outside of ontolex, e.g., "noun", as an
example of an Observable?
- "Lexicographers use (corpus) frequency and distribution information while
compiling lexical entries, as a qualitative assessment of their resources."
Replace qualitative with quantitative here?
- Caption for example 2 duplicates the word example ("Example 2: Example:
Frequency of the Sumerian word _a_ 'water'") and seems to suggest it is
about another word than the example truly contains (kal-ga instead of a).
The duplication of the word "example" occurs for other examples too.
- "the existence of a certain lexical phenomena" --> "the existence of a
certain lexical phenomenon"
- "In scholarly dictionaries, attestations are a representative selection
from the occurrences of a headword in a textual corpus." Or a specific
sense (the case for the Dictionary of Old English), or a specific form,
etc. So not limited to headword.
- "The property frac:attestation associates an attestation to the
frac:Observable." Please rephrase so that it indicates the direction of the
property (i.e., domain and range). --> "The property frac:attestation
establishes a relation between a frac:Observable and an attestation
thereof."
- Example 4 does not type frac:Attestation; example 5 does. Do we have a
preference for its explicit inclusion or absence?
- "frac:locus normally refers to a location identified by RFC5147 character
offsets, NIF URIs, Open Annotation or Text Fragments references, whereas
frac:observedIn refers to dct:Texts or dct:Collections." Perhaps references
can be made to these standards, or to their respective subsections of
section 6? Additionally, sections 6 changes the naming from Open Annotation
to Web Annotation. It would be good to scan the document for consistency in
order to avoid confusion.
- The definition of "Collocation" currently lists "SubClassOf:
frac:Observation, rdfs:Container, frac:Observable". I suspect
frac:Observable has to be removed here.
On Tue, 22 Apr 2025 at 14:23, Fahad Khan <anasfkhan81@gmail.com> wrote:
> Dear John, All,
> Here are my comments on the FrAC draft:
>
> https://docs.google.com/document/d/148Mtlag7bvl-GCpOpXRxPPQUj1fSTBa7yZvHek0rPQY/edit?usp=sharing
> Cheers,
> Fahad
>
> Il giorno lun 21 apr 2025 alle ore 09:02 John P. McCrae <
> john.mccrae@insight-centre.org> ha scritto:
>
>> Hi,
>>
>> I have one further comment on the public review version:
>>
>> - The range of "unit" is given as a string. It would be much better
>> if these could be replaced by elements from a standard vocabulary such as
>> LexInfo
>>
>> Regards,
>> John
>>
>> Ar Déar 17 Aib 2025 ag 10:19, scríobh John P. McCrae <
>> john.mccrae@insight-centre.org>:
>>
>>> Hi all,
>>>
>>> This is a reminder that we have one week left for public review
>>> comments. If you would find some time to review and make any comments that
>>> would be really appreciated.
>>>
>>> As there is already one comment, we will be reopening this module but if
>>> you have any comments now would be the best time to make it so that we can
>>> handle all comments in the 2nd public review.
>>>
>>> Regards,
>>> John
>>>
>>> Ar Déar 23 Ean 2025 ag 11:38, scríobh John McCrae <
>>> john.mccrae@insight-centre.org>:
>>>
>>>> Dear OntoLex CG Members,
>>>>
>>>> The group working on the Frequency, Attestation and Corpus Information
>>>> (FrAC) module has completed the first draft of the specification, which is
>>>> now available for public review.
>>>>
>>>> All comments on the specification must be made as a post to
>>>> public-ontolex@w3.org by *April 23rd* in order to be considered. No
>>>> changes to the specification are allowed except in response to comments on
>>>> the public review. If no comments are received the specification will be
>>>> published as is.
>>>>
>>>> The specification files are available on GitHub at:
>>>>
>>>> https://ontolex.github.io/frequency-attestation-corpus-information/
>>>>
>>>> https://github.com/ontolex/frequency-attestation-corpus-information/blob/master/owl/frac.ttl
>>>>
>>>> I also attach them to this email.
>>>>
>>>> Regards,
>>>> John, Christian and Max
>>>>
>>>
Received on Wednesday, 23 April 2025 15:28:45 UTC