Re: Testing harness for a potential BSPL/NL "protocol translator agent"

Hello,

I will share here an update to what I have elaborated.

I have been working on different questlines, mostly:
1. experimenting and comparing different processes for structured
generation of formal protocols
2. defining a basic framework for models testing on formal generation for
BSPL
3. defining a viable incremental process to identify ways to go for
developing specialised agents

Here some insights for possible actionables in the next few weeks:
1. I have experimented with different context and prompt engineering
techniques, passing prompts to the better performing publicly available
models (that can work on a single machine with a GPU and 16 Gb of RAM). The
only model with constant performance comparable to commercial sota is
llama3.1, in general it looks that smaller models (llama3.2, smollm2)
perform badly on formal definitions.
2. I made available this repo with the initial setup for prompt generation
and  model comparison https://github.com/Mec-iS/w3c-agents-features
main focus is *features engineering for structured generation*, BSPL is the
initial testing protocol but this can be extended to others. I have
developed some training pairings and logging for one-shot translation. Log
files can be loaded into a `streamlit` dataviz app to compare quality of
outputs among models; this can be extended to comparing quality among
different prompts. I will add code and issues in this repo for
this particular task of feature engineering. In the
`data/commercial_models` directory you can find examples of outputs from
commercial models, we should analyse and pick a reference NL definition or
a mix of characteristics from those to establish the best possible way of
putting protocols into natural language (please note that these have been
generated using sota reasoners and chain-of-thoughts implementations).
3. after this brief experimenting, I can try to design a roadmap to a
robust BSPL to NL and back translator:
    a.  one-shot structured generation (see repo: use out-of-the-box
models, good portability and no cost for wide availability)
    b. chain graph (RAG and other integrations, this should improve on the
results of a. but in a less portable way)
    c. Reinforcement Learning solutions (kind of custom opinionated
solutions, probably more precise and robust but less portable and
available):
         1. naive RL implementation
         2. neurosymbolic RL implementation (using `synlinks` library for
example, this should provide a quite solid solution but quite complex to
optimise the learning process and weaken portability)
    d. fine-tune on BSPL (custom tuning of available models): this require
a lot more of example translations and arbitrary complex techniques that
may hinder portability

So we can extract some features we need to provide "approved" specialised
agents:
* availability/ease of access: easily downloadable, defining "approved"
model repositories (ollama, huggingface, ...)
* openness: components should be open to analysis and interpretable (open
weights, open source, open dataset, etc.)
* portability: agent should provide comparable performances with the
selected models
* ergonomicity: a framework to test and analyse agent behaviour and
eventually deploy them via libraries packaging

Some extras:
* Waiting for "Alps" LLM model to be made available by the Swiss
Supercomputing Centre, it may be a strong publicly available model
* this paper about structured generation https://arxiv.org/pdf/2410.18146

Cheers,

On Fri, 11 Jul 2025 at 15:28, Lorenzo Moriondo <tunedconsulting@gmail.com>
wrote:

> Hello,
>
> I have developed a simple test bed based on LangChain and GraphChain to
> test context engineering and answers' quality for a potential "translator"
> from natural language to a formal protocol and vice versa. I have used only
> publicly available and non-commercial models.
> You can see some logs at
> https://gist.github.com/Mec-iS/e6647e8287414b77867f5aa66491ca26 with a
> brief explanation in the first comment.
>
> I am keeping the code repository private for now as it is partial (but
> quite straightforward to use). If you want access to the code to replicate
> on your local machine, write me your Github handler and I will share it.
>
> Have a nice weekend,
>
> --
> ¤ acM ¤
> Lorenzo
> Moriondo
> @lorenzogotuned
> https://www.linkedin.com/in/lorenzomoriondo
> https://github.com/Mec-iS
>


-- 
¤ acM ¤
Lorenzo
Moriondo
@lorenzogotuned
https://www.linkedin.com/in/lorenzomoriondo
https://github.com/Mec-iS

Received on Thursday, 17 July 2025 13:41:40 UTC