- From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
- Date: Wed, 11 Jun 2014 12:32:59 +0200
- To: gjb <gjb@crs4.it>
- Cc: public-bpmlod@w3.org, public-ontolex <public-ontolex@w3.org>
- Message-ID: <CAC5njqr5igOE0+0gOMhvxxEXYKqT-RAezAY_nhOmGV6ZwhTpuQ@mail.gmail.com>
Hi Gavin, Thanks for your interest in the group. With respect to encoding all forms of a word, this is obviously something that is not really desirable from a lexicographic view point as it becomes very verbose very quickly (for example Italian has ~50 forms of a regular verb). As such many existing resources simply assign each word to a morphological category, such an approach is taken for example by Benoît Sagot (Lefff, Leffe, EnLex, DeLex) or Språkbanken. It has then always be understood that these patterns can somehow be mapped to some implementation when the lexical resource is used in a system such as NooJ. As such, from the point of view of the lexicon we assume that it was always possible to indicate a pattern, e.g., :camminare a lemon:LexicalEntry ; lemon:pattern :italian_regular_verb_are . There are some attempts to further encode the meaning of theses patterns either in LMF (see appropriate documentation at ISO) or in *lemon *(see http://lemon-model.net/lemon-cookbook/node35.html), however I have yet to see this applied successfully yet. >From the current status of the OntoLex CG (who are responsible for defining the next iteration of the *lemon *model), it is not currently planned to support the identification of inflectional pattern beyond giving the URI for the pattern. I think there would be interest in both BPMLOD and OntoLex, in seeing how the defined models in practice interact with a system such as NooJ. Regards, John P. McCrae On Wed, Jun 11, 2014 at 12:02 PM, gjb <gjb@crs4.it> wrote: > On 23/05/2014 15:27, Jorge Gracia wrote: > >> Hi Dave, >> >> Yes, we can wait for more feedback from LD4LT to go further with corpora >> and terminologies. As for corpora, I included your suggested separation in >> the wiki table >> https://www.w3.org/community/bpmlod/wiki/Guidelines_for_LD_ >> generation_of_Language_resources_-_previous_notes >> >> Regards, >> Jorge >> > > Hi bpmlod people, > > I follow the progress of the bpmlod community with a > watchful eye - though I have been quite passive todate. > I am intrigued by the latest reports - especially: > > Re: Report and recommendations for converting BabelNet as Linguistic > Linked Data and it's adoption of Lemon (all news to me, thanks) > > Section 2 ends: > > Issues: BabelNet does not currently provide all word forms for a >> lemma, resulting therefore in a duplication of information ... >> > > Q: Would there be any mileage in looking at NOOJ to help here? > > http://en.wikipedia.org/wiki/NooJ > > Nooj seems well respected by working linguist from various > language communities and it has evolved into an open source > code-base in recent years. > Nooj has the great advantage that it can be > made to represent/recognise all word forms for a lemma - in a > deterministic, computer-readable form. > It's not an XML representation but it seems comprehensive for > non- ideogrammatic texts. > It's syntax might be the basis for BabelNet representation. > > Perhaps there is an alternative representations that might be > more ready of the Multilingual Web, you know of ? > > best regards > > Gavin Brelstaff > CRS4 - Sardinia - gjb @ crs4.it > > > > > > >
Received on Wednesday, 11 June 2014 10:33:28 UTC