Re: diffusion tech

> On 25 Dec 2022, at 01:45, Paola Di Maio <paola.dimaio@gmail.com> wrote:
> 
> Good read, 
> would be nice to frame diffusion in the context of KR
> 
> https://techcrunch.com/2022/12/22/a-brief-history-of-diffusion-the-tech-at-the-heart-of-modern-image-generating-ai/


That article doesn’t really explain how image generators actually work, as “de-noising” is little more than a buzzword.  Image generators learn how to decompose images at different levels of abstraction, e.g. at a lower level, using Gabor filters for modelling textures, and at a higher level by knowing something about typical dogs and cats.  Text prompts are mapped to concepts and combined with noise to stochastically generate details in an iterative process that diffuses constraints across the image, working in the space of latent semantics, making design choices all the way, before a final stage which takes latent semantics as instructions to generate the image pixels.

I recommend reading Steven Derby’s Ph.D thesis as he has done extensive work on determining what knowledge is exposed at different layers in neural networks, see:

 https://pure.qub.ac.uk/en/studentTheses/interpretable-semantic-representations-from-neural-language-model

I am hoping to start work next year on ways to directly manipulate latent semantics in artificial neural networks.  In principle, this should pave the way to enabling artists to work collaboratively with image generators, allowing the artist to make suggestions to refine a compositions in an iterative creative process.

A bigger challenge is to introduce richer knowledge, e.g. to ensure that image compositions embody causal constraints, and that people have four fingers and a thumb on each hand! How can we combine multiple sources of knowledge and ways of reasoning to support that? This is likely to require a paradigm shift that introduces sequential reasoning and continuous learning, and will introduce self-awareness along the way!

Dave Raggett <dsr@w3.org>

Received on Sunday, 25 December 2022 11:52:30 UTC