- From: Lorenzo Moriondo <tunedconsulting@gmail.com>
- Date: Thu, 17 Jul 2025 14:41:22 +0100
- To: public-webagents <public-webagents@w3.org>
- Message-ID: <CAKgLLmussnChRrBpaT2Tj2npudZjqGjRaRVGrST4xE_px2DkLQ@mail.gmail.com>
Hello, I will share here an update to what I have elaborated. I have been working on different questlines, mostly: 1. experimenting and comparing different processes for structured generation of formal protocols 2. defining a basic framework for models testing on formal generation for BSPL 3. defining a viable incremental process to identify ways to go for developing specialised agents Here some insights for possible actionables in the next few weeks: 1. I have experimented with different context and prompt engineering techniques, passing prompts to the better performing publicly available models (that can work on a single machine with a GPU and 16 Gb of RAM). The only model with constant performance comparable to commercial sota is llama3.1, in general it looks that smaller models (llama3.2, smollm2) perform badly on formal definitions. 2. I made available this repo with the initial setup for prompt generation and model comparison https://github.com/Mec-iS/w3c-agents-features main focus is *features engineering for structured generation*, BSPL is the initial testing protocol but this can be extended to others. I have developed some training pairings and logging for one-shot translation. Log files can be loaded into a `streamlit` dataviz app to compare quality of outputs among models; this can be extended to comparing quality among different prompts. I will add code and issues in this repo for this particular task of feature engineering. In the `data/commercial_models` directory you can find examples of outputs from commercial models, we should analyse and pick a reference NL definition or a mix of characteristics from those to establish the best possible way of putting protocols into natural language (please note that these have been generated using sota reasoners and chain-of-thoughts implementations). 3. after this brief experimenting, I can try to design a roadmap to a robust BSPL to NL and back translator: a. one-shot structured generation (see repo: use out-of-the-box models, good portability and no cost for wide availability) b. chain graph (RAG and other integrations, this should improve on the results of a. but in a less portable way) c. Reinforcement Learning solutions (kind of custom opinionated solutions, probably more precise and robust but less portable and available): 1. naive RL implementation 2. neurosymbolic RL implementation (using `synlinks` library for example, this should provide a quite solid solution but quite complex to optimise the learning process and weaken portability) d. fine-tune on BSPL (custom tuning of available models): this require a lot more of example translations and arbitrary complex techniques that may hinder portability So we can extract some features we need to provide "approved" specialised agents: * availability/ease of access: easily downloadable, defining "approved" model repositories (ollama, huggingface, ...) * openness: components should be open to analysis and interpretable (open weights, open source, open dataset, etc.) * portability: agent should provide comparable performances with the selected models * ergonomicity: a framework to test and analyse agent behaviour and eventually deploy them via libraries packaging Some extras: * Waiting for "Alps" LLM model to be made available by the Swiss Supercomputing Centre, it may be a strong publicly available model * this paper about structured generation https://arxiv.org/pdf/2410.18146 Cheers, On Fri, 11 Jul 2025 at 15:28, Lorenzo Moriondo <tunedconsulting@gmail.com> wrote: > Hello, > > I have developed a simple test bed based on LangChain and GraphChain to > test context engineering and answers' quality for a potential "translator" > from natural language to a formal protocol and vice versa. I have used only > publicly available and non-commercial models. > You can see some logs at > https://gist.github.com/Mec-iS/e6647e8287414b77867f5aa66491ca26 with a > brief explanation in the first comment. > > I am keeping the code repository private for now as it is partial (but > quite straightforward to use). If you want access to the code to replicate > on your local machine, write me your Github handler and I will share it. > > Have a nice weekend, > > -- > ¤ acM ¤ > Lorenzo > Moriondo > @lorenzogotuned > https://www.linkedin.com/in/lorenzomoriondo > https://github.com/Mec-iS > -- ¤ acM ¤ Lorenzo Moriondo @lorenzogotuned https://www.linkedin.com/in/lorenzomoriondo https://github.com/Mec-iS
Received on Thursday, 17 July 2025 13:41:40 UTC