Re: what AI tool can we use to generate the AI KR CG vocabulary from Peter Rivett on 2024-12-04 (public-aikr@w3.org from December 2024)

From: Peter Rivett <pete.rivett@federatedknowledge.com>
Date: Wed, 4 Dec 2024 17:02:18 +0000
To: Paola Di Maio <paoladimaio10@gmail.com>, Chris Harding <chris@lacibus.net>
CC: W3C AIKR CG <public-aikr@w3.org>
Message-ID: <BY5PR14MB39212CD72C2F256DB18B96D081372@BY5PR14MB3921.namprd14.prod.outlook.com>
Paola,
Minor but important point: the values for License already entered represent copyright not license.
Someone owning the copyright gets to declare how they license their work but the license is different. For example all the variants of Creative Commons licenses. OMG has a few variations too including a "Non-assert' license which allows people to freely use a spec unless they assert IP claims on that spec. And I know Chris is familiar with Open Group licensing which often requires some level of membership.
I have often felt that IP licensing deserves an ontology in its own right.

In fact the first entry, the Bergman book, is not AFAIK "Copyright the author" since the book itself states both the copyright and the license (which is that all rights are reserved i.e. you're not allowed to do anything with it) as follows. I know Mike has made the 'pre-release" version available for free on his website which puts it in a legally gray zone I'm not going to get into but my point is the difference between copyright and license.

The book says:

© Springer Nature Switzerland AG 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

A challenge for everyone these days is which licenses permit scanning by AI (which is what I assume Chris is doing) and subsequent usage of the model so trained.


BTW the 4th link is not to the Sowa book, but to a review of the Sowa book by Stuart C Shapiro, which is itself quite informative. To say "copyright by the owner" is tautologous without saying who the owner actually is.

I'm not having a go at you, Paola, but wanting to establish good precedents before others contribute.

Pete
________________________________
From: Paola Di Maio <paoladimaio10@gmail.com>
Sent: Wednesday, December 4, 2024 8:17 AM
To: Chris Harding <chris@lacibus.net>
Cc: Peter Rivett <pete.rivett@federatedknowledge.com>; W3C AIKR CG <public-aikr@w3.org>
Subject: Re: what AI tool can we use to generate the AI KR CG vocabulary

Chris, thank you


Ir will be an interesting exercise, and I d be interested to learn about the tool you are building /using
Sounds very powerful, but would suggest in the first instance to limit the output
to a handful of terms *can we cope with hundreds of terms?
is there a way t filter the output by frequency or other criterion?
is there a way to select the top 100 key terms, to make sure we do not get flooded
for example by parsing the  abstract/introduction section only *would that make sense?


I have deleted the spam entries in the previous form, and regenerated the form as follows
with 4 of my favourite references
https://forms.gle/DzUjwkP7sfQ91JTY7

i hope in  truth maintenance systems exist in there *(have not checked)

The form links to the spreadsheet containing the input but I have hidden the emails of the submitters
https://docs.google.com/spreadsheets/d/10X73IqyfGxJ1VTrUtZghhAXTMQL38GjKJxSOi82gPbc/edit?usp=sharing


Although I have selected 'the setting to require input in all slots, the form seems to
submit even with blank slots *baffled
It should be enough fun/work just to start extracting some key terms from these resources and go through them
will the tool associate the key term/concept with a definition?
will the tool extract the terms from all the resources in batch, or one by one (*would be important to have an indication from which resource eat term comes from, preferably from which page/chapter) I ll add this to the vocabulary form as an additional attribute (source. page/chapter)
after we are satisfied that this works, we can ask CG members to pick their favourite KR resource and fill out the form . this will help us populate the list of sources (sorry this did not happen until now)

as far as who the users are going to be, I d say ideally it should be both general users AND technical users
or somewhere in between, (non binary?) lol
Please take a look at the resource entry form and spreadsheet and let me know
if they are OK

[https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-green-avg-v1.png]<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>     Virus-free.www.avg.com<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>

On Wed, Dec 4, 2024 at 2:51 PM Chris Harding <chris@lacibus.net<mailto:chris@lacibus.net>> wrote:
Thanks Paola!

If you can point me to the document or documents that the vocabulary
should be generated from, I can create an initial draft. A collection of
documents with overlapping material is a good starting point. It can be
interesting to see which terms come up in which documents.

It may also help if you can indicate what kinds of term the vocabulary
should contain. For example, I'm collecting key terms for some reference
architecture standards, and am finding that asking the LLM to find
things like systems, components, designs etc. can give better results
than just asking it for key terms.

There are many possible output formats. The simplest is probably HTML
dl/dt/dd. How far does the group want to formalise things like "see
also" and allow for a term to have different definitions in different
contexts? This can get quite complicated.

Pete's question about how the vocabularies are to be used is a good one.
For example, will the people who use them have a technical background,
or are they intended for the general public?

On 03/12/2024 21:41, Paola Di Maio wrote:
> @chris  thanks for the offer- yes i expect some manual input to complete
> the process but maybe with some training..
> could become fully automated. let me know how to proceed any too to
> support the effort is welcome
>
> @Peter Rivett <mailto:pete.rivett@federatedknowledge.com<mailto:pete.rivett@federatedknowledge.com>>
>
>     It's not clear to me in all this what format(s) existing glossaries/
>     vocabularies in step 3 (do we have a list?)
>
> <Milton pointed to two (ISO and another) in a related email, but we
> could identify more  I entered the link the in wiki page
>
>     re expected to be in, and what the required generated output format
>     should be.
>
> a nice plain language to start with, with any encoding of choice,  could
> be any ML
>
>     And, for step 4, what degree of difference should lead to a new
>     vocabulary entry.
>
> as defined by the user - that is: are you satisfied that the current AI
> standards so far (as pointed by Milton and possibly to be expand)
> represent the AI KR domain correctly? I am not- current AI standard
> vocabularies do not represent the AI KR domain adequately
> There may be KR terms  already in existing AI standards that need to be
> disambiguated/defined further
> that is where we can make an additional contribution
>   1st.contribution is to identify the terms that are overlooked in
> existing AI standards 2nd.nd contribution to provide alternative KR
> definitions  if we are not satisfied with the existing ones
>
>     And how the new entry should reference the existing one (e.g. as
>     some sort of "similar" or specialized term).
>
> good question- I think It could simply list the existing term as defined
> in existing standards as an attribute
> (previous definitions?)  In fact, I may add another field to the entry
> form 'overlapping domain' as some AI KR terms could be defined elsewhere
> * in medicine vs law for example
>
>     And, in the resultant ecosystem, how the whole family of
>     vocabularies should be represented; both the existing ones that can
>     be reused and the new differentiated terms.
>
> Uhm, that is a very big question- ecosystems tend to sort themselves
> out, so as long as they are published on the internet, but suggestions
> welcome
>
>     It would also seem desirable to be able to indicate the specific
>     terms in the existing vocabularies that have been deemed reusable.
>
> think about a way of doing that and share it here, perhaps
>
>
>     To answer my own question somewhat, there is the OMG Multiple
>     Vocabulary Facility https://www.omg.org/spec/MVF <https://
>     www.omg.org/spec/MVF<http://www.omg.org/spec/MVF>> that provides for different vocabularies
>     (terms and definitions related to Communities and with mutual import
>     relationships) with the terms mapped to their meaning as a concept.
>
> THANKS that s great, can you add to the wiki? i can do it if I remember it
>
>     However a lot depends on how we expect the set of vocabularies to be
>     used and by whom (type/role of person or machine)?
>
> for now, I feel that I would be satisfied if we could, after about six
> years of looking into AI KR
> can point out with some efficiency and precision the concepts/terms not
> yet covered in AI standards
> Nice to be able to put our finger into some open wound with clarity and
> precision
> i guess the end result would be used by both humans and machines in
> whatever way other resources are used, in the same way that we find use
> any directory
> I am itching to identify some terms and concepts and invite everyone to
> do the same
>
> Ponder one or two terms close to your heart and mind
> I am starting with:
> misrepresentation (justification is:fake AI)
> malicious confounding (justification is:deliberate use of
> misrepresentation to mislead)
>
>
>     Pete
>
>     Pete Rivett (pete.rivett@federatedknowledge.com<mailto:pete.rivett@federatedknowledge.com>
>     <mailto:pete.rivett@federatedknowledge.com<mailto:pete.rivett@federatedknowledge.com>>)
>     Federated Knowledge, LLC (LEI 98450013F6D4AFE18E67)
>     tel: +1-701-566-9534
>     Schedule a meeting at https://calendly.com/rivettp <https://
>     calendly.com/rivettp<http://calendly.com/rivettp>>
>
>     ------------------------------------------------------------------------
>     *From:* Chris Harding <chris@lacibus.net<mailto:chris@lacibus.net> <mailto:chris@lacibus.net<mailto:chris@lacibus.net>>>
>     *Sent:* Tuesday, December 3, 2024 9:37 AM
>     *To:* paoladimaio10@googlemail.com<mailto:paoladimaio10@googlemail.com>
>     <mailto:paoladimaio10@googlemail.com<mailto:paoladimaio10@googlemail.com>> <paoladimaio10@googlemail.com<mailto:paoladimaio10@googlemail.com>
>     <mailto:paoladimaio10@googlemail.com<mailto:paoladimaio10@googlemail.com>>>
>     *Cc:* W3C AIKR CG <public-aikr@w3.org<mailto:public-aikr@w3.org> <mailto:public-aikr@w3.org<mailto:public-aikr@w3.org>>>
>     *Subject:* Re: what AI tool can we use to generate the AI KR CG
>     vocabulary
>     Hi Paola
>
>     I am working on a commercial tool in this area. I can do 1 and 2: they
>     are no problem with modern NLP technology but, as with anything produced
>     by AI, need final human review. 3, 4 and 5 should be fairly
>     straightforward also, but would need some work on the mechanism for
>     importing existing glossaries/vocabularies.
>
>     On 03/12/2024 03:05, Paola Di Maio wrote:
>      > Knowledgeable CG members
>      >
>      > Since the future is here, suggestions as to how to generate
>      > a vocabulary, would be great (preferably open, free, online tool or
>      > python script?)
>      > Process
>      > 1.select source of K (upload or point to source doc)
>      > 2. extract key terms and definitions
>      > 3. compare key terms with existing glossaries/vocabularies already
>      > published (input URLs)
>      > 4. if not included in existing resources in 3 OR
>      >          if definition is different from the same term in existing
>      > resource in 3
>      > THEN
>      > 5. include term in a list (to be discussed, evaluated, refined)
>      >
>      > I definitely buy drinks is someone can do
>
>     --
>     Regards,
>     Chris
>     ++++
>
>     Chris Harding
>     Chief Executive, Lacibus Ltd
>
>

--
Regards,
Chris
++++

Chris Harding
Chief Executive, Lacibus Ltd
Received on Wednesday, 4 December 2024 17:02:28 UTC