- From: Paola Di Maio <paoladimaio10@gmail.com>
- Date: Wed, 4 Dec 2024 16:17:40 +0000
- To: Chris Harding <chris@lacibus.net>
- Cc: Peter Rivett <pete.rivett@federatedknowledge.com>, W3C AIKR CG <public-aikr@w3.org>
- Message-ID: <CAMXe=Sp1XwvPeG+c=bQF9NhNj2xab=ypes0Nz1wfi78120-02g@mail.gmail.com>
Chris, thank you Ir will be an interesting exercise, and I d be interested to learn about the tool you are building /using Sounds very powerful, but would suggest in the first instance to limit the output to a handful of terms *can we cope with hundreds of terms? is there a way t filter the output by frequency or other criterion? is there a way to select the top 100 key terms, to make sure we do not get flooded for example by parsing the abstract/introduction section only *would that make sense? I have deleted the spam entries in the previous form, and regenerated the form as follows with 4 of my favourite references https://forms.gle/DzUjwkP7sfQ91JTY7 i hope in truth maintenance systems exist in there *(have not checked) The form links to the spreadsheet containing the input but I have hidden the emails of the submitters https://docs.google.com/spreadsheets/d/10X73IqyfGxJ1VTrUtZghhAXTMQL38GjKJxSOi82gPbc/edit?usp=sharing Although I have selected 'the setting to require input in all slots, the form seems to submit even with blank slots *baffled It should be enough fun/work just to start extracting some key terms from these resources and go through them will the tool associate the key term/concept with a definition? will the tool extract the terms from all the resources in batch, or one by one (*would be important to have an indication from which resource eat term comes from, preferably from which page/chapter) I ll add this to the vocabulary form as an additional attribute (source. page/chapter) after we are satisfied that this works, we can ask CG members to pick their favourite KR resource and fill out the form . this will help us populate the list of sources (sorry this did not happen until now) as far as who the users are going to be, I d say ideally it should be both general users AND technical users or somewhere in between, (non binary?) lol Please take a look at the resource entry form and spreadsheet and let me know if they are OK <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Wed, Dec 4, 2024 at 2:51 PM Chris Harding <chris@lacibus.net> wrote: > Thanks Paola! > > If you can point me to the document or documents that the vocabulary > should be generated from, I can create an initial draft. A collection of > documents with overlapping material is a good starting point. It can be > interesting to see which terms come up in which documents. > > It may also help if you can indicate what kinds of term the vocabulary > should contain. For example, I'm collecting key terms for some reference > architecture standards, and am finding that asking the LLM to find > things like systems, components, designs etc. can give better results > than just asking it for key terms. > > There are many possible output formats. The simplest is probably HTML > dl/dt/dd. How far does the group want to formalise things like "see > also" and allow for a term to have different definitions in different > contexts? This can get quite complicated. > > Pete's question about how the vocabularies are to be used is a good one. > For example, will the people who use them have a technical background, > or are they intended for the general public? > > On 03/12/2024 21:41, Paola Di Maio wrote: > > @chris thanks for the offer- yes i expect some manual input to complete > > the process but maybe with some training.. > > could become fully automated. let me know how to proceed any too to > > support the effort is welcome > > > > @Peter Rivett <mailto:pete.rivett@federatedknowledge.com> > > > > It's not clear to me in all this what format(s) existing glossaries/ > > vocabularies in step 3 (do we have a list?) > > > > <Milton pointed to two (ISO and another) in a related email, but we > > could identify more I entered the link the in wiki page > > > > re expected to be in, and what the required generated output format > > should be. > > > > a nice plain language to start with, with any encoding of choice, could > > be any ML > > > > And, for step 4, what degree of difference should lead to a new > > vocabulary entry. > > > > as defined by the user - that is: are you satisfied that the current AI > > standards so far (as pointed by Milton and possibly to be expand) > > represent the AI KR domain correctly? I am not- current AI standard > > vocabularies do not represent the AI KR domain adequately > > There may be KR terms already in existing AI standards that need to be > > disambiguated/defined further > > that is where we can make an additional contribution > > 1st.contribution is to identify the terms that are overlooked in > > existing AI standards 2nd.nd contribution to provide alternative KR > > definitions if we are not satisfied with the existing ones > > > > And how the new entry should reference the existing one (e.g. as > > some sort of "similar" or specialized term). > > > > good question- I think It could simply list the existing term as defined > > in existing standards as an attribute > > (previous definitions?) In fact, I may add another field to the entry > > form 'overlapping domain' as some AI KR terms could be defined elsewhere > > * in medicine vs law for example > > > > And, in the resultant ecosystem, how the whole family of > > vocabularies should be represented; both the existing ones that can > > be reused and the new differentiated terms. > > > > Uhm, that is a very big question- ecosystems tend to sort themselves > > out, so as long as they are published on the internet, but suggestions > > welcome > > > > It would also seem desirable to be able to indicate the specific > > terms in the existing vocabularies that have been deemed reusable. > > > > think about a way of doing that and share it here, perhaps > > > > > > To answer my own question somewhat, there is the OMG Multiple > > Vocabulary Facility https://www.omg.org/spec/MVF <https:// > > www.omg.org/spec/MVF> that provides for different vocabularies > > (terms and definitions related to Communities and with mutual import > > relationships) with the terms mapped to their meaning as a concept. > > > > THANKS that s great, can you add to the wiki? i can do it if I remember > it > > > > However a lot depends on how we expect the set of vocabularies to be > > used and by whom (type/role of person or machine)? > > > > for now, I feel that I would be satisfied if we could, after about six > > years of looking into AI KR > > can point out with some efficiency and precision the concepts/terms not > > yet covered in AI standards > > Nice to be able to put our finger into some open wound with clarity and > > precision > > i guess the end result would be used by both humans and machines in > > whatever way other resources are used, in the same way that we find use > > any directory > > I am itching to identify some terms and concepts and invite everyone to > > do the same > > > > Ponder one or two terms close to your heart and mind > > I am starting with: > > misrepresentation (justification is:fake AI) > > malicious confounding (justification is:deliberate use of > > misrepresentation to mislead) > > > > > > Pete > > > > Pete Rivett (pete.rivett@federatedknowledge.com > > <mailto:pete.rivett@federatedknowledge.com>) > > Federated Knowledge, LLC (LEI 98450013F6D4AFE18E67) > > tel: +1-701-566-9534 > > Schedule a meeting at https://calendly.com/rivettp <https:// > > calendly.com/rivettp> > > > > > ------------------------------------------------------------------------ > > *From:* Chris Harding <chris@lacibus.net <mailto:chris@lacibus.net>> > > *Sent:* Tuesday, December 3, 2024 9:37 AM > > *To:* paoladimaio10@googlemail.com > > <mailto:paoladimaio10@googlemail.com> <paoladimaio10@googlemail.com > > <mailto:paoladimaio10@googlemail.com>> > > *Cc:* W3C AIKR CG <public-aikr@w3.org <mailto:public-aikr@w3.org>> > > *Subject:* Re: what AI tool can we use to generate the AI KR CG > > vocabulary > > Hi Paola > > > > I am working on a commercial tool in this area. I can do 1 and 2: > they > > are no problem with modern NLP technology but, as with anything > produced > > by AI, need final human review. 3, 4 and 5 should be fairly > > straightforward also, but would need some work on the mechanism for > > importing existing glossaries/vocabularies. > > > > On 03/12/2024 03:05, Paola Di Maio wrote: > > > Knowledgeable CG members > > > > > > Since the future is here, suggestions as to how to generate > > > a vocabulary, would be great (preferably open, free, online tool > or > > > python script?) > > > Process > > > 1.select source of K (upload or point to source doc) > > > 2. extract key terms and definitions > > > 3. compare key terms with existing glossaries/vocabularies already > > > published (input URLs) > > > 4. if not included in existing resources in 3 OR > > > if definition is different from the same term in existing > > > resource in 3 > > > THEN > > > 5. include term in a list (to be discussed, evaluated, refined) > > > > > > I definitely buy drinks is someone can do > > > > -- > > Regards, > > Chris > > ++++ > > > > Chris Harding > > Chief Executive, Lacibus Ltd > > > > > > -- > Regards, > Chris > ++++ > > Chris Harding > Chief Executive, Lacibus Ltd > >
Received on Wednesday, 4 December 2024 16:18:22 UTC