Re: [bpmlod] type of LRs resources for the guidelines from John P. McCrae on 2014-06-11 (public-ontolex@w3.org from June 2014)

From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Wed, 11 Jun 2014 12:32:59 +0200
To: gjb <gjb@crs4.it>
Cc: public-bpmlod@w3.org, public-ontolex <public-ontolex@w3.org>
Message-ID: <CAC5njqr5igOE0+0gOMhvxxEXYKqT-RAezAY_nhOmGV6ZwhTpuQ@mail.gmail.com>

Hi Gavin,

Thanks for your interest in the group.

With respect to encoding all forms of a word, this is obviously something
that is not really desirable from a lexicographic view point as it becomes
very verbose very quickly (for example Italian has ~50 forms of a regular
verb). As such many existing resources simply assign each word to a
morphological category, such an approach is taken for example by Benoît
Sagot (Lefff, Leffe, EnLex, DeLex) or Språkbanken. It has then always be
understood that these patterns can somehow be mapped to some implementation
when the lexical resource is used in a system such as NooJ. As such, from
the point of view of the lexicon we assume that it was always possible to
indicate a pattern, e.g.,

:camminare a lemon:LexicalEntry ;
  lemon:pattern :italian_regular_verb_are .

There are some attempts to further encode the meaning of theses patterns
either in LMF (see appropriate documentation at ISO) or in *lemon *(see
http://lemon-model.net/lemon-cookbook/node35.html), however I have yet to
see this applied successfully yet.

>From the current status of the OntoLex CG (who are responsible for defining
the next iteration of the *lemon *model), it is not currently planned to
support the identification of inflectional pattern beyond giving the URI
for the pattern.

I think there would be interest in both BPMLOD and OntoLex, in seeing how
the defined models in practice interact with a system such as NooJ.

Regards,
John P. McCrae

On Wed, Jun 11, 2014 at 12:02 PM, gjb <gjb@crs4.it> wrote:

> On 23/05/2014 15:27, Jorge Gracia wrote:
>
>> Hi Dave,
>>
>> Yes, we can wait for more feedback from LD4LT to go further with corpora
>> and terminologies. As for corpora, I included your suggested separation in
>> the wiki table
>> https://www.w3.org/community/bpmlod/wiki/Guidelines_for_LD_
>> generation_of_Language_resources_-_previous_notes
>>
>> Regards,
>> Jorge
>>
>
> Hi bpmlod people,
>
> I follow the progress of the bpmlod community with a
> watchful eye - though I have been quite passive todate.
> I am intrigued by the latest reports - especially:
>
> Re: Report and recommendations for converting BabelNet as Linguistic
> Linked Data and it's adoption of Lemon (all news to me, thanks)
>
> Section 2 ends:
>
>  Issues: BabelNet does not currently provide all word forms for a
>> lemma, resulting therefore in a duplication of information ...
>>
>
> Q: Would there be any mileage in looking at NOOJ to help here?
>
> http://en.wikipedia.org/wiki/NooJ
>
> Nooj seems well respected by working linguist from various
> language communities and it has evolved into an open source
> code-base in recent years.
> Nooj has the great advantage that it can be
> made to represent/recognise all word forms for a lemma - in a
> deterministic, computer-readable form.
> It's not an XML representation  but it seems comprehensive for
> non- ideogrammatic texts.
> It's syntax might be the basis for BabelNet representation.
>
> Perhaps there is an alternative representations that might be
> more ready of the Multilingual Web, you know of ?
>
> best regards
>
> Gavin Brelstaff
> CRS4 - Sardinia - gjb @ crs4.it
>
>
>
>
>
>
>

Received on Wednesday, 11 June 2014 10:33:28 UTC