Re: diffusion tech from Dave Raggett on 2022-12-26 (public-aikr@w3.org from December 2022)

From: Dave Raggett <dsr@w3.org>
Date: Mon, 26 Dec 2022 12:00:01 +0000
To: Paola Di Maio <paoladimaio10@gmail.com>
Cc: W3C AIKR CG <public-aikr@w3.org>
Message-Id: <3264D7C8-0181-4A58-BD44-8C8D4B88D1DC@w3.org>
There are plenty of resources for learning how to work with neural networks, but very little on manipulation of latent semantics, so a primer would be premature.

A likely way forward will be to train a secondary network to generate detailed descriptions of an image composition from the latent semantics, and to then use that in reverse to apply transformations. e.g. to change the colour of a given disc from green to blue and to add a yellow square. One complication is that the latent semantics are spread across layers. There is a loose similarity to natural language translation in which involves a stack of layers than start from syntactic details, transition to semantics then to syntactic details, before the word tokens for the target language.

One of the challenges for working with neural networks is the mathematics, so it is not for the faint hearted!

> On 25 Dec 2022, at 12:40, Paola Di Maio <paoladimaio10@gmail.com> wrote:
> 
> Thank you very much Dave
> it will be very interesting to learn more in depth about the topic
> Having read a few PhD theses, I suspect they are not the easiest of reads 
> But maybe, we could put together a primer, with the help of others on this list
> on the topic. I d like to do something along those lines 
> PDM
> 
> On Sun, Dec 25, 2022 at 7:52 PM Dave Raggett <dsr@w3.org <mailto:dsr@w3.org>> wrote:
>> 
>>> On 25 Dec 2022, at 01:45, Paola Di Maio <paola.dimaio@gmail.com <mailto:paola.dimaio@gmail.com>> wrote:
>>> 
>>> Good read, 
>>> would be nice to frame diffusion in the context of KR
>>> 
>>> https://techcrunch.com/2022/12/22/a-brief-history-of-diffusion-the-tech-at-the-heart-of-modern-image-generating-ai/
>> 
>> 
>> That article doesn’t really explain how image generators actually work, as “de-noising” is little more than a buzzword.  Image generators learn how to decompose images at different levels of abstraction, e.g. at a lower level, using Gabor filters for modelling textures, and at a higher level by knowing something about typical dogs and cats.  Text prompts are mapped to concepts and combined with noise to stochastically generate details in an iterative process that diffuses constraints across the image, working in the space of latent semantics, making design choices all the way, before a final stage which takes latent semantics as instructions to generate the image pixels.
>> 
>> I recommend reading Steven Derby’s Ph.D thesis as he has done extensive work on determining what knowledge is exposed at different layers in neural networks, see:
>> 
>>  https://pure.qub.ac.uk/en/studentTheses/interpretable-semantic-representations-from-neural-language-model
>> 
>> I am hoping to start work next year on ways to directly manipulate latent semantics in artificial neural networks.  In principle, this should pave the way to enabling artists to work collaboratively with image generators, allowing the artist to make suggestions to refine a compositions in an iterative creative process.
>> 
>> A bigger challenge is to introduce richer knowledge, e.g. to ensure that image compositions embody causal constraints, and that people have four fingers and a thumb on each hand! How can we combine multiple sources of knowledge and ways of reasoning to support that? This is likely to require a paradigm shift that introduces sequential reasoning and continuous learning, and will introduce self-awareness along the way!
>> 
>> Dave Raggett <dsr@w3.org <mailto:dsr@w3.org>>
>> 
>> 
>> 

Dave Raggett <dsr@w3.org>
Received on Monday, 26 December 2022 12:00:15 UTC