Re: how to evaluate AI generated code, improve and learning from Paola Di Maio on 2026-04-18 (public-agentprotocol@w3.org from April 2026)

From: Paola Di Maio <paoladimaio10@gmail.com>
Date: Sat, 18 Apr 2026 14:27:18 +0800
To: Mike Gifford <mike.gifford@civicactions.com>, public-aiwss@w3.org, public-agentprotocol <public-agentprotocol@w3.org>
Cc: WebAI Interest Group at W3C <public-webai@w3.org>, W3C AIKR CG <public-aikr@w3.org>
Message-ID: <CAMXe=Sr9OPmPi=TC3MUFouVytB=mTe8hFSgsV0DF8ApOjc6cjA@mail.gmail.com>
Thanks Mike for playing with the tools
I replied via private email/slack *I think

re AICEL There was a silly bug that generated A score for no code see note below

I am now playing with Vercel deployments, and will likely retire Aicel
on Hugging Face

Please continue to give feedback!  may not have time to test and eval
myself, plus it s always good to get other people to do so

https://aicels.vercel.app/

AND  please try

https://codesafetycheckr2026.vercel.app/

NOTE:
Too many bugs to catch  *problem was
On Sat, Apr 4, 2026 at 1:00 AM Paola Di Maio <paola.dimaio@gmail.com> wrote:
>
> Dear Mike
> thanks again for testing AICELS
> The bug, or rather the flaw in the teaching app was:

empty input gets an A because:
ast.parse("") succeeds (empty string is valid Python)
exec("", {}) succeeds (nothing to execute, nothing to fail)
Quality checks find zero functions, zero classes, zero anything -- so
zero issues
Zero issues = grade A

It's a straightforward bug: the evaluate_code_for_gradio function
never checks whether the input is empty or whitespace-only before
proceeding.
The fix is simple -- add an early return at the top of that function.
    FIXED see https://aicels.vercel.app/



On Mon, Mar 30, 2026 at 11:35 PM Mike Gifford
<mike.gifford@civicactions.com> wrote:
>
> Thanks for this Paola,
>
> I picked some out from my AI generated collection:
> https://github.com/mgifford?tab=repositories&q=&type=&language=python&sort=
>
> The forks may not have been AI generated, but if it was my code it certainly was.
>
> I have a few scripts that aren't stand-alone scripts (that fail).  I've had a few B's. One script I think might have been too long.  My best guess as to why this didn't run:
> https://github.com/mgifford/wayback-extractor/blob/main/wayback_extractor.py
>
> Aside: Is it possible to execute the script checker on run? No code gives me an A. Also, is it possible to just put in a URL?
>
> I took your guidance and put it into a project that was getting a B (generated with Gemini then updated with PR 113 below):
> https://github.com/mgifford/sam_gov_md/blob/main/PYTHON_GUIDANCE.md
>
> I gave this prompt:
> https://github.com/mgifford/sam_gov_md/issues/110
>
> Which Copilot turned into this PR:
> https://github.com/mgifford/sam_gov_md/pull/111
>
> Which didn't get the results I had hoped for (I still got a B) so I tried again with:
> https://github.com/mgifford/sam_gov_md/issues/112
>
> Which produced:
> https://github.com/mgifford/sam_gov_md/pull/113
>
> This page now gets an A:
> https://github.com/mgifford/sam_gov_md/blob/main/scripts/regenerate_markdown_with_attachments.py
>
> Does the script still work? I think so..  I'd need someone else to tell me if it is good, but this script is a good step in the right direction.
>
> I have been playing with:
>   https://mgifford.github.io/ACCESSIBILITY.md/
>   https://github.com/mgifford/accessibility-skills
>
> Sometimes it seems to work as I expect.
>
> Mike
>
>
> On Sun, Mar 29, 2026 at 4:49 PM Paola Di Maio <paola.dimaio@gmail.com> wrote:
>>
>> During the breakout,  the questions came up *maybe Roy or Gaowei
>>
>> How to improve software generated by coding agents
>>
>> If properly briefed/prompted/supervised, coding agents can produce excellent code
>>  Mike G said the programming and open source communities feel threatened by this competition
>>
>> As a teacher my concern is that humans will no longer have the motivation to learn how to code and will over rely on agents, with loss of control over systems.   But to stay on top of machines highly skilled humans will continue to be essential, their role may change to system designers, testers and managers
>>
>> With that in mind, I created AICEL an app that
>> a) evaluates python quality  b) gives advice on how to improve the code based on known good practices -  serving as a learning tool for both machines and humans.
>>
>>   The catch is that the app was written by a coding agent and I still dont like/coding myself at all *but I am a software/systems engineer and enjoy very much being in this loop
>>
>> It would be awesome if someone could try it out
>>
>> 1. it evaluates python code against standard eval criteria *did only a couple of tests, seems to work
>> 2. should give guidance on how to improve poor code
>>
>> I would be most grateful if the learned would test and give feedback and help to expand
>> Please help to eval
>>
>> It is currently running and deployed in two environments, and to be extended to other languages
>> https://huggingface.co/spaces/STARBORN/AICELS
>> https://colab.research.google.com/drive/1La9F34HOv_j3cy3zwq5Pz_X0NewV46nT?usp=sharing
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Mar 29, 2026 at 12:42 PM Paola Di Maio <paola.dimaio@gmail.com> wrote:
>>>
>>> Dom, and everyone
>>> Thanks for raising these important issues -
>>>
>>> I can also not participate in meetings very often, *regrets but working on related topics
>>>
>>> I therefore share two DRAFT technical notes that aim to capture some of the issues and are open for editing/contribution
>>> I d be interested to know if people agree/disagree or feel otherwise
>>>
>>> https://w3c-cg.github.io/aikr/TNAI/machine-consumable-specs.html
>>> I was wondering what is the difference between   machine readable specs and specs writtent for AI, and attempt to make the distinction in section 5
>>>
>>> AND
>>> https://w3c-cg.github.io/aikr/TN-S/sustainability-ai-assistants.html
>>>
>>> Apologies in advance for some broken links in these drafts, I am fixing them soon
>>> I will also add a reference in there to the meeting organised by Dom and the minutes,
>>>
>>> PDM
>>>
>>> On Fri, Mar 27, 2026 at 5:00 PM Fabien Gandon <fabien.gandon@inria.fr> wrote:
>>>>
>>>>
>>>> Thank you Dom.
>>>> It would be great to have you at the following meeting.
>>>> Best,
>>>>
>>>>
>>>>
>>>>
>>>> ----- Mail original -----
>>>> > De: "Dominique Hazaël-Massieux" <dom@w3.org>
>>>> > À: "WebAI Interest Group at W3C" <public-webai@w3.org>
>>>> > Envoyé: Mercredi 25 Mars 2026 17:58:46
>>>> > Objet: Re: Invitation: W3C Breakout session: AI-generated software and Web  standardization
>>>>
>>>> > For those interested, the minutes of the breakout are available at:
>>>> >   https://www.w3.org/2026/03/25-ai-gen-stds-minutes.html - with the
>>>> > slides I presented as an intro to the topic at
>>>> > https://www.w3.org/2026/Talks/dhm-ai-software/
>>>> >
>>>> > I can't make it to the next scheduled IG meeting, but if there is
>>>> > interest, I could present some of the points that were raised there and
>>>> > open questions at the following meeting.
>>>> >
>>>> > Dom
>>>> >
>>>> > Le 25/03/2026 à 11:07, Roy Ruoxi Ran a écrit :
>>>> >> Dear all,
>>>> >>
>>>> >> I would like to draw your attention to an upcoming W3C Breakout session
>>>> >> from Dom that may be of interest to the Web & AI Interest Group *today*:
>>>> >>
>>>> >>
>>>> >>   - W3C Breakout session: AI-generated software and Web standardization
>>>> >>
>>>> >> - Event page: https://www.w3.org/events/meetings/d3eeea4f-
>>>> >> e4dc-470d-9d4a-6a33a8fdcf8f/ <https://www.w3.org/events/meetings/
>>>> >> d3eeea4f-e4dc-470d-9d4a-6a33a8fdcf8f/>
>>>> >>
>>>> >> This talk explores how AI is increasingly shaping the way software
>>>> >> systems are designed and used, in particular, the shift toward AI agents
>>>> >> acting on behalf of users and the implications this has on Web architecture.
>>>> >>
>>>> >> Welcome everyone who is interested in to join the session and share your
>>>> >> perspectives.
>>>> >>
>>>> >> For others W3C breakouts at: https://www.w3.org/calendar/breakouts-
>>>> >> day-2026/grid/
>>>> >>
>>>> >> Thank you and Best Regards,
>>>> >>
>>>> >> Roy Ruoxi Ran, 冉若曦, W3C
>>>> >>
>>>> >>
>>>> >>
>>>>
>
>
> --
>
> Mike Gifford, Open Standards & Practices Lead, CivicActions
> Digital Services Coalition Board Member
> Drupal Core Accessibility Maintainer, IAAP CPWA Certified
> https://CivicActions.com  |  https://accessibility.civicactions.com
> https://mastodon.social/@mgifford  |  http://linkedin.com/in/mgifford
Received on Saturday, 18 April 2026 06:28:01 UTC