Re: what AI tool can we use to generate the AI KR CG vocabulary from Paola Di Maio on 2024-12-04 (public-aikr@w3.org from December 2024)

From: Paola Di Maio <paoladimaio10@gmail.com>
Date: Wed, 4 Dec 2024 16:17:40 +0000
To: Chris Harding <chris@lacibus.net>
Cc: Peter Rivett <pete.rivett@federatedknowledge.com>, W3C AIKR CG <public-aikr@w3.org>
Message-ID: <CAMXe=Sp1XwvPeG+c=bQF9NhNj2xab=ypes0Nz1wfi78120-02g@mail.gmail.com>
Chris, thank you


Ir will be an interesting exercise, and I d be interested to learn about
the tool you are building /using
Sounds very powerful, but would suggest in the first instance to limit the
output
to a handful of terms *can we cope with hundreds of terms?
is there a way t filter the output by frequency or other criterion?
is there a way to select the top 100 key terms, to make sure we do not get
flooded
for example by parsing the  abstract/introduction section only *would that
make sense?


I have deleted the spam entries in the previous form, and regenerated the
form as follows
with 4 of my favourite references
https://forms.gle/DzUjwkP7sfQ91JTY7
i hope in  truth maintenance systems exist in there *(have not checked)

The form links to the spreadsheet containing the input but I have hidden
the emails of the submitters
https://docs.google.com/spreadsheets/d/10X73IqyfGxJ1VTrUtZghhAXTMQL38GjKJxSOi82gPbc/edit?usp=sharing

Although I have selected 'the setting to require input in all slots, the
form seems to
submit even with blank slots *baffled
It should be enough fun/work just to start extracting some key terms from
these resources and go through them
will the tool associate the key term/concept with a definition?
will the tool extract the terms from all the resources in batch, or one by
one (*would be important to have an indication from which resource eat term
comes from, preferably from which page/chapter) I ll add this to the
vocabulary form as an additional attribute (source. page/chapter)
after we are satisfied that this works, we can ask CG members to pick their
favourite KR resource and fill out the form . this will help us populate
the list of sources (sorry this did not happen until now)

as far as who the users are going to be, I d say ideally it should be both
general users AND technical users
or somewhere in between, (non binary?) lol
Please take a look at the resource entry form and spreadsheet and let me
know
if they are OK

<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Wed, Dec 4, 2024 at 2:51 PM Chris Harding <chris@lacibus.net> wrote:

> Thanks Paola!
>
> If you can point me to the document or documents that the vocabulary
> should be generated from, I can create an initial draft. A collection of
> documents with overlapping material is a good starting point. It can be
> interesting to see which terms come up in which documents.
>
> It may also help if you can indicate what kinds of term the vocabulary
> should contain. For example, I'm collecting key terms for some reference
> architecture standards, and am finding that asking the LLM to find
> things like systems, components, designs etc. can give better results
> than just asking it for key terms.
>
> There are many possible output formats. The simplest is probably HTML
> dl/dt/dd. How far does the group want to formalise things like "see
> also" and allow for a term to have different definitions in different
> contexts? This can get quite complicated.
>
> Pete's question about how the vocabularies are to be used is a good one.
> For example, will the people who use them have a technical background,
> or are they intended for the general public?
>
> On 03/12/2024 21:41, Paola Di Maio wrote:
> > @chris  thanks for the offer- yes i expect some manual input to complete
> > the process but maybe with some training..
> > could become fully automated. let me know how to proceed any too to
> > support the effort is welcome
> >
> > @Peter Rivett <mailto:pete.rivett@federatedknowledge.com>
> >
> >     It's not clear to me in all this what format(s) existing glossaries/
> >     vocabularies in step 3 (do we have a list?)
> >
> > <Milton pointed to two (ISO and another) in a related email, but we
> > could identify more  I entered the link the in wiki page
> >
> >     re expected to be in, and what the required generated output format
> >     should be.
> >
> > a nice plain language to start with, with any encoding of choice,  could
> > be any ML
> >
> >     And, for step 4, what degree of difference should lead to a new
> >     vocabulary entry.
> >
> > as defined by the user - that is: are you satisfied that the current AI
> > standards so far (as pointed by Milton and possibly to be expand)
> > represent the AI KR domain correctly? I am not- current AI standard
> > vocabularies do not represent the AI KR domain adequately
> > There may be KR terms  already in existing AI standards that need to be
> > disambiguated/defined further
> > that is where we can make an additional contribution
> >   1st.contribution is to identify the terms that are overlooked in
> > existing AI standards 2nd.nd contribution to provide alternative KR
> > definitions  if we are not satisfied with the existing ones
> >
> >     And how the new entry should reference the existing one (e.g. as
> >     some sort of "similar" or specialized term).
> >
> > good question- I think It could simply list the existing term as defined
> > in existing standards as an attribute
> > (previous definitions?)  In fact, I may add another field to the entry
> > form 'overlapping domain' as some AI KR terms could be defined elsewhere
> > * in medicine vs law for example
> >
> >     And, in the resultant ecosystem, how the whole family of
> >     vocabularies should be represented; both the existing ones that can
> >     be reused and the new differentiated terms.
> >
> > Uhm, that is a very big question- ecosystems tend to sort themselves
> > out, so as long as they are published on the internet, but suggestions
> > welcome
> >
> >     It would also seem desirable to be able to indicate the specific
> >     terms in the existing vocabularies that have been deemed reusable.
> >
> > think about a way of doing that and share it here, perhaps
> >
> >
> >     To answer my own question somewhat, there is the OMG Multiple
> >     Vocabulary Facility https://www.omg.org/spec/MVF <https://
> >     www.omg.org/spec/MVF> that provides for different vocabularies
> >     (terms and definitions related to Communities and with mutual import
> >     relationships) with the terms mapped to their meaning as a concept.
> >
> > THANKS that s great, can you add to the wiki? i can do it if I remember
> it
> >
> >     However a lot depends on how we expect the set of vocabularies to be
> >     used and by whom (type/role of person or machine)?
> >
> > for now, I feel that I would be satisfied if we could, after about six
> > years of looking into AI KR
> > can point out with some efficiency and precision the concepts/terms not
> > yet covered in AI standards
> > Nice to be able to put our finger into some open wound with clarity and
> > precision
> > i guess the end result would be used by both humans and machines in
> > whatever way other resources are used, in the same way that we find use
> > any directory
> > I am itching to identify some terms and concepts and invite everyone to
> > do the same
> >
> > Ponder one or two terms close to your heart and mind
> > I am starting with:
> > misrepresentation (justification is:fake AI)
> > malicious confounding (justification is:deliberate use of
> > misrepresentation to mislead)
> >
> >
> >     Pete
> >
> >     Pete Rivett (pete.rivett@federatedknowledge.com
> >     <mailto:pete.rivett@federatedknowledge.com>)
> >     Federated Knowledge, LLC (LEI 98450013F6D4AFE18E67)
> >     tel: +1-701-566-9534
> >     Schedule a meeting at https://calendly.com/rivettp <https://
> >     calendly.com/rivettp>
> >
> >
>  ------------------------------------------------------------------------
> >     *From:* Chris Harding <chris@lacibus.net <mailto:chris@lacibus.net>>
> >     *Sent:* Tuesday, December 3, 2024 9:37 AM
> >     *To:* paoladimaio10@googlemail.com
> >     <mailto:paoladimaio10@googlemail.com> <paoladimaio10@googlemail.com
> >     <mailto:paoladimaio10@googlemail.com>>
> >     *Cc:* W3C AIKR CG <public-aikr@w3.org <mailto:public-aikr@w3.org>>
> >     *Subject:* Re: what AI tool can we use to generate the AI KR CG
> >     vocabulary
> >     Hi Paola
> >
> >     I am working on a commercial tool in this area. I can do 1 and 2:
> they
> >     are no problem with modern NLP technology but, as with anything
> produced
> >     by AI, need final human review. 3, 4 and 5 should be fairly
> >     straightforward also, but would need some work on the mechanism for
> >     importing existing glossaries/vocabularies.
> >
> >     On 03/12/2024 03:05, Paola Di Maio wrote:
> >      > Knowledgeable CG members
> >      >
> >      > Since the future is here, suggestions as to how to generate
> >      > a vocabulary, would be great (preferably open, free, online tool
> or
> >      > python script?)
> >      > Process
> >      > 1.select source of K (upload or point to source doc)
> >      > 2. extract key terms and definitions
> >      > 3. compare key terms with existing glossaries/vocabularies already
> >      > published (input URLs)
> >      > 4. if not included in existing resources in 3 OR
> >      >          if definition is different from the same term in existing
> >      > resource in 3
> >      > THEN
> >      > 5. include term in a list (to be discussed, evaluated, refined)
> >      >
> >      > I definitely buy drinks is someone can do
> >
> >     --
> >     Regards,
> >     Chris
> >     ++++
> >
> >     Chris Harding
> >     Chief Executive, Lacibus Ltd
> >
> >
>
> --
> Regards,
> Chris
> ++++
>
> Chris Harding
> Chief Executive, Lacibus Ltd
>
>
Received on Wednesday, 4 December 2024 16:18:22 UTC