Re: what AI tool can we use to generate the AI KR CG vocabulary from Paola Di Maio on 2024-12-05 (public-aikr@w3.org from December 2024)

From: Paola Di Maio <paoladimaio10@gmail.com>
Date: Thu, 5 Dec 2024 03:30:33 +0000
To: Peter Rivett <pete.rivett@federatedknowledge.com>
Cc: Chris Harding <chris@lacibus.net>, W3C AIKR CG <public-aikr@w3.org>
Message-ID: <CAMXe=SrWcojVDybhDp2i7LE=Cwa4-BC6TFV45i877_Hd4JrZfA@mail.gmail.com>
Thank you Peter
I compiled the form and entries in zero time and superficially
You are right, we should either do away with that info or have it up
correctly
Anything else?  I ll correct the form and and entries together with any
other suggestions


On Wed, Dec 4, 2024 at 5:02 PM Peter Rivett <
pete.rivett@federatedknowledge.com> wrote:

> Paola,
> Minor but important point: the values for License already entered
> represent copyright not license.
> Someone owning the copyright gets to declare how they license their work
> but the license is different. For example all the variants of Creative
> Commons licenses. OMG has a few variations too including a "Non-assert'
> license which allows people to freely use a spec unless they assert IP
> claims on that spec. And I know Chris is familiar with Open Group licensing
> which often requires some level of membership.
> I have often felt that IP licensing deserves an ontology in its own right.
>
> In fact the first entry, the Bergman book, is not AFAIK "Copyright the
> author" since the book itself states both the copyright and the license
> (which is that all rights are reserved i.e. you're not allowed to do
> anything with it) as follows. I know Mike has made the 'pre-release"
> version available for free on his website which puts it in a legally gray
> zone I'm not going to get into but my point is the difference between
> copyright and license.
>
> The book says:
>
> © Springer Nature Switzerland AG 2018
> This work is subject to copyright. All rights are reserved by the
> Publisher, whether the whole or part of the material is concerned,
> specifically the rights of translation, reprinting, reuse of illustrations,
> recitation, broadcasting, reproduction on microfilms or in any other
> physical way, and transmission or information storage and retrieval,
> electronic adaptation, computer software, or by similar or dissimilar
> methodology now known or hereafter developed.
>
> A challenge for everyone these days is which licenses permit scanning by
> AI (which is what I assume Chris is doing) and subsequent usage of the
> model so trained.
>
>
> BTW the 4th link is not to the Sowa book, but to a review of the Sowa
> book by Stuart C Shapiro, which is itself quite informative. To say
> "copyright by the owner" is tautologous without saying who the owner
> actually is.
>
> I'm not having a go at you, Paola, but wanting to establish good
> precedents before others contribute.
>
> Pete
> ------------------------------
> *From:* Paola Di Maio <paoladimaio10@gmail.com>
> *Sent:* Wednesday, December 4, 2024 8:17 AM
> *To:* Chris Harding <chris@lacibus.net>
> *Cc:* Peter Rivett <pete.rivett@federatedknowledge.com>; W3C AIKR CG <
> public-aikr@w3.org>
> *Subject:* Re: what AI tool can we use to generate the AI KR CG vocabulary
>
> Chris, thank you
>
>
> Ir will be an interesting exercise, and I d be interested to learn about
> the tool you are building /using
> Sounds very powerful, but would suggest in the first instance to limit the
> output
> to a handful of terms *can we cope with hundreds of terms?
> is there a way t filter the output by frequency or other criterion?
> is there a way to select the top 100 key terms, to make sure we do not get
> flooded
> for example by parsing the  abstract/introduction section only *would that
> make sense?
>
>
> I have deleted the spam entries in the previous form, and regenerated the
> form as follows
> with 4 of my favourite references
> https://forms.gle/DzUjwkP7sfQ91JTY7
> i hope in  truth maintenance systems exist in there *(have not checked)
>
> The form links to the spreadsheet containing the input but I have hidden
> the emails of the submitters
>
> https://docs.google.com/spreadsheets/d/10X73IqyfGxJ1VTrUtZghhAXTMQL38GjKJxSOi82gPbc/edit?usp=sharing
>
> Although I have selected 'the setting to require input in all slots, the
> form seems to
> submit even with blank slots *baffled
> It should be enough fun/work just to start extracting some key terms from
> these resources and go through them
> will the tool associate the key term/concept with a definition?
> will the tool extract the terms from all the resources in batch, or one by
> one (*would be important to have an indication from which resource eat term
> comes from, preferably from which page/chapter) I ll add this to the
> vocabulary form as an additional attribute (source. page/chapter)
> after we are satisfied that this works, we can ask CG members to pick
> their favourite KR resource and fill out the form . this will help us
> populate the list of sources (sorry this did not happen until now)
>
> as far as who the users are going to be, I d say ideally it should be both
> general users AND technical users
> or somewhere in between, (non binary?) lol
> Please take a look at the resource entry form and spreadsheet and let me
> know
> if they are OK
>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Virus-free.www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#m_1955707396116767427_x_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> On Wed, Dec 4, 2024 at 2:51 PM Chris Harding <chris@lacibus.net> wrote:
>
> Thanks Paola!
>
> If you can point me to the document or documents that the vocabulary
> should be generated from, I can create an initial draft. A collection of
> documents with overlapping material is a good starting point. It can be
> interesting to see which terms come up in which documents.
>
> It may also help if you can indicate what kinds of term the vocabulary
> should contain. For example, I'm collecting key terms for some reference
> architecture standards, and am finding that asking the LLM to find
> things like systems, components, designs etc. can give better results
> than just asking it for key terms.
>
> There are many possible output formats. The simplest is probably HTML
> dl/dt/dd. How far does the group want to formalise things like "see
> also" and allow for a term to have different definitions in different
> contexts? This can get quite complicated.
>
> Pete's question about how the vocabularies are to be used is a good one.
> For example, will the people who use them have a technical background,
> or are they intended for the general public?
>
> On 03/12/2024 21:41, Paola Di Maio wrote:
> > @chris  thanks for the offer- yes i expect some manual input to complete
> > the process but maybe with some training..
> > could become fully automated. let me know how to proceed any too to
> > support the effort is welcome
> >
> > @Peter Rivett <mailto:pete.rivett@federatedknowledge.com>
> >
> >     It's not clear to me in all this what format(s) existing glossaries/
> >     vocabularies in step 3 (do we have a list?)
> >
> > <Milton pointed to two (ISO and another) in a related email, but we
> > could identify more  I entered the link the in wiki page
> >
> >     re expected to be in, and what the required generated output format
> >     should be.
> >
> > a nice plain language to start with, with any encoding of choice,  could
> > be any ML
> >
> >     And, for step 4, what degree of difference should lead to a new
> >     vocabulary entry.
> >
> > as defined by the user - that is: are you satisfied that the current AI
> > standards so far (as pointed by Milton and possibly to be expand)
> > represent the AI KR domain correctly? I am not- current AI standard
> > vocabularies do not represent the AI KR domain adequately
> > There may be KR terms  already in existing AI standards that need to be
> > disambiguated/defined further
> > that is where we can make an additional contribution
> >   1st.contribution is to identify the terms that are overlooked in
> > existing AI standards 2nd.nd contribution to provide alternative KR
> > definitions  if we are not satisfied with the existing ones
> >
> >     And how the new entry should reference the existing one (e.g. as
> >     some sort of "similar" or specialized term).
> >
> > good question- I think It could simply list the existing term as defined
> > in existing standards as an attribute
> > (previous definitions?)  In fact, I may add another field to the entry
> > form 'overlapping domain' as some AI KR terms could be defined elsewhere
> > * in medicine vs law for example
> >
> >     And, in the resultant ecosystem, how the whole family of
> >     vocabularies should be represented; both the existing ones that can
> >     be reused and the new differentiated terms.
> >
> > Uhm, that is a very big question- ecosystems tend to sort themselves
> > out, so as long as they are published on the internet, but suggestions
> > welcome
> >
> >     It would also seem desirable to be able to indicate the specific
> >     terms in the existing vocabularies that have been deemed reusable.
> >
> > think about a way of doing that and share it here, perhaps
> >
> >
> >     To answer my own question somewhat, there is the OMG Multiple
> >     Vocabulary Facility https://www.omg.org/spec/MVF <https://
> >     www.omg.org/spec/MVF> that provides for different vocabularies
> >     (terms and definitions related to Communities and with mutual import
> >     relationships) with the terms mapped to their meaning as a concept.
> >
> > THANKS that s great, can you add to the wiki? i can do it if I remember
> it
> >
> >     However a lot depends on how we expect the set of vocabularies to be
> >     used and by whom (type/role of person or machine)?
> >
> > for now, I feel that I would be satisfied if we could, after about six
> > years of looking into AI KR
> > can point out with some efficiency and precision the concepts/terms not
> > yet covered in AI standards
> > Nice to be able to put our finger into some open wound with clarity and
> > precision
> > i guess the end result would be used by both humans and machines in
> > whatever way other resources are used, in the same way that we find use
> > any directory
> > I am itching to identify some terms and concepts and invite everyone to
> > do the same
> >
> > Ponder one or two terms close to your heart and mind
> > I am starting with:
> > misrepresentation (justification is:fake AI)
> > malicious confounding (justification is:deliberate use of
> > misrepresentation to mislead)
> >
> >
> >     Pete
> >
> >     Pete Rivett (pete.rivett@federatedknowledge.com
> >     <mailto:pete.rivett@federatedknowledge.com>)
> >     Federated Knowledge, LLC (LEI 98450013F6D4AFE18E67)
> >     tel: +1-701-566-9534
> >     Schedule a meeting at https://calendly.com/rivettp <https://
> >     calendly.com/rivettp>
> >
> >
>  ------------------------------------------------------------------------
> >     *From:* Chris Harding <chris@lacibus.net <mailto:chris@lacibus.net>>
> >     *Sent:* Tuesday, December 3, 2024 9:37 AM
> >     *To:* paoladimaio10@googlemail.com
> >     <mailto:paoladimaio10@googlemail.com> <paoladimaio10@googlemail.com
> >     <mailto:paoladimaio10@googlemail.com>>
> >     *Cc:* W3C AIKR CG <public-aikr@w3.org <mailto:public-aikr@w3.org>>
> >     *Subject:* Re: what AI tool can we use to generate the AI KR CG
> >     vocabulary
> >     Hi Paola
> >
> >     I am working on a commercial tool in this area. I can do 1 and 2:
> they
> >     are no problem with modern NLP technology but, as with anything
> produced
> >     by AI, need final human review. 3, 4 and 5 should be fairly
> >     straightforward also, but would need some work on the mechanism for
> >     importing existing glossaries/vocabularies.
> >
> >     On 03/12/2024 03:05, Paola Di Maio wrote:
> >      > Knowledgeable CG members
> >      >
> >      > Since the future is here, suggestions as to how to generate
> >      > a vocabulary, would be great (preferably open, free, online tool
> or
> >      > python script?)
> >      > Process
> >      > 1.select source of K (upload or point to source doc)
> >      > 2. extract key terms and definitions
> >      > 3. compare key terms with existing glossaries/vocabularies already
> >      > published (input URLs)
> >      > 4. if not included in existing resources in 3 OR
> >      >          if definition is different from the same term in existing
> >      > resource in 3
> >      > THEN
> >      > 5. include term in a list (to be discussed, evaluated, refined)
> >      >
> >      > I definitely buy drinks is someone can do
> >
> >     --
> >     Regards,
> >     Chris
> >     ++++
> >
> >     Chris Harding
> >     Chief Executive, Lacibus Ltd
> >
> >
>
> --
> Regards,
> Chris
> ++++
>
> Chris Harding
> Chief Executive, Lacibus Ltd
>
>
Received on Thursday, 5 December 2024 03:31:16 UTC