- From: ProjectParadigm-ICT-Program <metadataportals@yahoo.com>
- Date: Sun, 31 Dec 2023 17:38:04 +0000 (UTC)
- To: Dave Raggett <dsr@w3.org>
- Cc: W3C AIKR CG <public-aikr@w3.org>, "paoladimaio10@googlemail.com" <paoladimaio10@googlemail.com>
- Message-ID: <715244422.3890758.1704044284679@mail.yahoo.com>
Dear Dave, I am aware of the foundation models. But there are linguistic issues that are simply not resolved by their use. If you are familiar with the plight of indigenous languages e.g. Canada, the USA, Central and South America, creole and pidgin languages, and many languages spoken in Africa, Asia and the Pacific the issue of ownership is a vital one. A quick look at https://www.ethnologue.com, and SIL (https://www.sil.org),the UNESCO efforts in the framework of the UN Decade of Indigenous Languages (2022-2032) and dozens of initiatives that have sprung up in recent years to document, digitize and conserve oral traditions, verbal communications in various forms and written texts, you may start to appreciate the magnitude and scope of the problem. SIL is specialized in providing computational linguistic tools for written languages, a key ingredient in formal structuring of languages for (small) data sets and language translation. Many of these language communities lack the human, financial and technological resources to participate in global AI development or provide adequate avenues for education in the native languages with use of computational linguistic tools and generative AI such as LLMs. I speak from experience as native speaker of a creole language called Papiamento, spoken mainly in Aruba, Bonaire and Curacao in the Caribbean, and related to the Portuguese derived creole language in the Cape Verde Islands (Cabo Verde). I am currently in the process of finishing the proposal for a Global Linguistic Initiative for Knowledge Infrastructures, a personal project started decades ago in 1994, after visiting the head office of the International Federation of Library Associations in The Hague in The Netherlands to discuss the setup op a global initiative for using computational linguistics to aid and promote the written use of indigenous languages.Mind you this was before the advent of the commercial Internet in 1997.At that time the European Union had started a program called Mercator for language learning (https://www.mercator-research.eu) which was one of the main inspirations for setting up a global initiative by complementing Mercator by a program for pidgin and creole languages spoken in former European country colonies (Spain, France, Portugal, the UK and the Netherlands mainly). Being a mathematician, computer scientist, software developer and global promoter of sustainable development through innovative use of ICT and other technologies through appropriate technology transfer and keenly aware of UNESCO and other UN bodies' work in these fields (e.g. FAO AIMS-AGROVOC (https://aims.fao.org/agrovoc) and the UNESCO World Atlas of Languages, I am principally motivated to participate in developing the knowledge representation for AI from this personal perspective. I started out by looking into the use of semantic web technologies for this purpose from 2007 onwards, which explains my participation in the mailing lists of the W3C. As a mathematician with a profound interest for formal modeling and the limitations thereof I have followed literature on quantum physics, astronomy, computational biology and lately neuroscience and cognitive sciences, philosophy, psychology and Buddhist philosophy (Madhyamaka Middle Way) to better understand how we humans arrive at formulating and formalizing knowledge, the role of sensory perception, observation and rational reasoning and decision-making processes. Artificial intelligence was just a budding field in the 1980s with coding done mostly in Prolog, Lisp and Modula in y university days, and at the time the use of AI was limited and I did not give it much thought at the time. My interest in AI was sparked when I noticed how much of what I had been tracking through the years was surfacing in the debate of the development and use of AI. And it was the rise of generative AI using large language models that made me aware of the imminent convergence of the current research and development of AI and my areas of interest in knowledge representation and use and computational linguistics and limitations to formal modeling. The resurgence of AI and its current technological proliferation are driven by both good intentions and corporate greed, as amply documented in the events surrounding the firing and rehiring of Sam Altman at OpenAI. Current developments at Meta and Google inspire little confidence as to how these linguistic challenges are going to be tackled and resolved. Fortunately librarians, gatekeepers and guardians of human knowledge at large offer some useful perspectives. Suffice it to say, we are entering a challenging period of rapid technological change driven by among other things AI, and making advanced knowledge available and accessible to all humans in their native languages will be key to keeping this world equitable and fair and provide equal opportunities to all (and save the planet too). In this knowledge representation for AI in all languages will be crucial to accomplishing this. Milton Ponson GSM: +297 747 8280 PO Box 1154, Oranjestad Aruba, Dutch Caribbean Project Paradigm: Bringing the ICT tools for sustainable development to all stakeholders worldwide through collaborative research on applied mathematics, advanced modeling, software and standards development On Sunday, December 31, 2023 at 07:08:43 AM AST, Dave Raggett <dsr@w3.org> wrote: Hi Milton, We’re all aware of the challenges around copyright in the era of generative AI, along with recognising the need for remuneration for creative effort. I don’t agree with your premise in respect to spoken languages as we have seen progress in using foundation models to support transfer learning for other less common languages with much smaller datasets. This will allow people to use the language of their choice, something the European Union is keen to support. On 31 Dec 2023, at 02:52, ProjectParadigm-ICT-Program <metadataportals@yahoo.com> wrote: Dave, You mentioned it being just a case of sufficient training data for agents to be able to deal with the issues I mentioned. You assume that all of this training data can and will be either scraped or given or made available of free will. This is an erroneous assumption because both the European Union AI Act and the increasing numbers of linguistic stakeholder representative groups challenging use of online data, because of cultural, national, linguistic and historical ownership of oral, textual, graphical and audiovisual data and information will make this training data increasingly bound to usage fees, thus de facto setting off a massive extinction of lesser spoken languages in technology. Big AI Tech is starting to resemble Big Pharma which favors only putting money into R&D that produces products for the biggest possible market segments with a one size fits all format. But there are some hopeful signs that this will be addressed. But we know from the current debate about English being predominant as language of both science and science publishing, that these linguistic issues will not be easily resolved. That is why large language models are likely to also fade away, in particular since their energy and environmental footprints (water usage for cooling e.g.) are now also being questioned. You cannot simply look at just the software models being used and the underlying mathematical modeling. The technology assessment of AI is just starting to get off the ground and will soon also enter the domain of discussion about regulation of AI. Milton Ponson GSM: +297 747 8280 PO Box 1154, Oranjestad Aruba, Dutch Caribbean Project Paradigm: Bringing the ICT tools for sustainable development to all stakeholders worldwide through collaborative research on applied mathematics, advanced modeling, software and standards development On Saturday, December 30, 2023 at 02:46:08 PM AST, Dave Raggett <dsr@w3.org> wrote: The good news is that the current concept of prompt engineering is likely to fade away as agents get better at understanding the context in which questions are asked and hence what kinds of responses will be most useful. I am at an early stage of an investigation into how to give cognitive agents a memory of the past, present and future, along with continual learning and explicit sequential deliberative reasoning. This will enable agents to adapt to individual users and tasks to be effective partners in human-machine collaboration. On Netflix, it is now commonplace to hear mixed language dialogues. Generative AI will no doubt soon be able to handle this effectively, as it is mainly just a matter of sufficient training data. One way to deal with hallucinations is using the proposer-critic pattern where one agent critiques the output of another. This would start with deliberative reasoning, and over time be “compiled” as the proposer critic learns from the feedback. On 30 Dec 2023, at 17:55, ProjectParadigm-ICT-Program <metadataportals@yahoo.com> wrote: I take issue with the term "prompt engineering"because it somehow implies creating a "well formed query"that "prompts" a "well formed input format" leading to an output within the range of scope and intention of the well formed query. But natural language is tricky, and as a polyglot I can assure you that you can make any chatbot hallucinate by language blending. I remember from my university days how as a mathematician I had conversations with philosophers and language students about this language blending, which is in short is combining common grammatical constructs but in one language switching to to idiomatic styles of another, change tonality and in some cases word order not unlike in poetry. Current literature on polyglots shows they have cognitive skills to better cope with bias and rational thinking. Unfortunately Big Internet and AI tech is monolinguistic and does not want to address these and other linguistic issues. Prompt engineering is what we would normally consider part of human computer interaction, and the vast body of scientific literature shows that between computational linguistics and generative AI using large language models lies a field of categories of statistical natural language modeling with inherent biases. We are still decades away from having a C3PO robot versed in all 7,000 plus human languages. Natural language is multiple contexts sensitive and IMHO current state of the art generative AI doesn't come close to dealing with this, hence the "prompt engineering" term is catchy but technically nonsense. Milton Ponson GSM: +297 747 8280 PO Box 1154, Oranjestad Aruba, Dutch Caribbean Project Paradigm: Bringing the ICT tools for sustainable development to all stakeholders worldwide through collaborative research on applied mathematics, advanced modeling, software and standards development On Saturday, December 30, 2023 at 05:16:20 AM AST, Paola Di Maio <paola.dimaio@gmail.com> wrote: We received fun intelligent (pseudo intelligent?) generative demos on this list (by Dave R) that show output, but did not describe the prompts. I asked about the prompt and received no reply(recursive empty prompt vector?) Prompt Engineering is a thing (but it is not new) Good article: https://www.zdnet.com/article/how-to-write-better-chatgpt-prompts/READ AND DISCUSS There is however new emphasis on Generative AI and Natural Language that moves the field on from SQL and the likeswhich is interesting and, dare I say, important I may be able to share some lecture notes Happiest possible year given the sodden circumstances the world is PDM Dave Raggett <dsr@w3.org> Dave Raggett <dsr@w3.org>
Received on Sunday, 31 December 2023 17:38:18 UTC