Re: diffusion tech from Paola Di Maio on 2022-12-26 (public-aikr@w3.org from December 2022)

From: Paola Di Maio <paoladimaio10@gmail.com>
Date: Mon, 26 Dec 2022 20:04:55 +0800
To: Dave Raggett <dsr@w3.org>
Cc: W3C AIKR CG <public-aikr@w3.org>
Message-ID: <CAMXe=Spfkdq1DzAZGzNhP5kWNBWTPZ9LahrdKgB8553BLH-5HA@mail.gmail.com>
Thank you Dave
this is where we have different views

diffusion is one thing, and latent semantics is another
you can study/apply them together, but you can learn about them separately

since you point out that the article in the media which explains what is
diffusion in relation to the discussions on this list and elsewhere (for
example re AI generated árt')
 a primer can simply summarize existing knowledge
(for example, from new PhD theses as you point to) in an easily readable
format

Luckily,  we do not have to agree on what goes into a primer - it only means
we may have to work separately on different versions of it :-)

we can always compare notes if and when we get to talk in person
until then. thanks for the pointers, and happy reading


On Mon, Dec 26, 2022 at 8:00 PM Dave Raggett <dsr@w3.org> wrote:

> There are plenty of resources for learning how to work with neural
> networks, but very little on manipulation of latent semantics, so a primer
> would be premature.
>
> A likely way forward will be to train a secondary network to generate
> detailed descriptions of an image composition from the latent semantics,
> and to then use that in reverse to apply transformations. e.g. to change
> the colour of a given disc from green to blue and to add a yellow square.
> One complication is that the latent semantics are spread across layers.
> There is a loose similarity to natural language translation in which
> involves a stack of layers than start from syntactic details, transition to
> semantics then to syntactic details, before the word tokens for the target
> language.
>
> One of the challenges for working with neural networks is the mathematics,
> so it is not for the faint hearted!
>
> On 25 Dec 2022, at 12:40, Paola Di Maio <paoladimaio10@gmail.com> wrote:
>
> Thank you very much Dave
> it will be very interesting to learn more in depth about the topic
> Having read a few PhD theses, I suspect they are not the easiest of reads
> But maybe, we could put together a primer, with the help of others on this
> list
> on the topic. I d like to do something along those lines
> PDM
>
> On Sun, Dec 25, 2022 at 7:52 PM Dave Raggett <dsr@w3.org> wrote:
>
>>
>> On 25 Dec 2022, at 01:45, Paola Di Maio <paola.dimaio@gmail.com> wrote:
>>
>> Good read,
>> would be nice to frame diffusion in the context of KR
>>
>>
>> https://techcrunch.com/2022/12/22/a-brief-history-of-diffusion-the-tech-at-the-heart-of-modern-image-generating-ai/
>>
>>
>> That article doesn’t really explain how image generators actually work,
>> as “de-noising” is little more than a buzzword.  Image generators learn how
>> to decompose images at different levels of abstraction, e.g. at a lower
>> level, using Gabor filters for modelling textures, and at a higher level by
>> knowing something about typical dogs and cats.  Text prompts are mapped to
>> concepts and combined with noise to stochastically generate details in an
>> iterative process that diffuses constraints across the image, working in
>> the space of latent semantics, making design choices all the way, before a
>> final stage which takes latent semantics as instructions to generate the
>> image pixels.
>>
>> I recommend reading Steven Derby’s Ph.D thesis as he has done extensive
>> work on determining what knowledge is exposed at different layers in neural
>> networks, see:
>>
>>
>> https://pure.qub.ac.uk/en/studentTheses/interpretable-semantic-representations-from-neural-language-model
>>
>> I am hoping to start work next year on ways to directly manipulate latent
>> semantics in artificial neural networks.  In principle, this should pave
>> the way to enabling artists to work collaboratively with image generators,
>> allowing the artist to make suggestions to refine a compositions in an
>> iterative creative process.
>>
>> A bigger challenge is to introduce richer knowledge, e.g. to ensure that
>> image compositions embody causal constraints, and that people have four
>> fingers and a thumb on each hand! How can we combine multiple sources of
>> knowledge and ways of reasoning to support that? This is likely to require
>> a paradigm shift that introduces sequential reasoning and continuous
>> learning, and will introduce self-awareness along the way!
>>
>> Dave Raggett <dsr@w3.org>
>>
>>
>>
>>
> Dave Raggett <dsr@w3.org>
>
>
>
>
Received on Monday, 26 December 2022 12:08:37 UTC