Re: Text to Image and human cognition

On 20 Oct 2022, at 18:44, Adeel <aahmad1811@gmail.com> wrote:

> The so called stable diffusion is not that stable. At most near-stable as they use continuous random sampling within the variational autoencoder in the latent space.
> As you mention they not perfect as they can produce blurry and unrealistic outputs from the probability distribution that is produced from the loss function.


The noise is needed to provide variations, but the resulting image reflects a hierarchy of choices based upon the training set of images.  Unfortunately, whilst the implicit knowledge is very good for faces, it is impoverished when it comes to fingers as is evident in the following example:



Unlike systems based upon deep learning, people are excellent when it comes to learning from a few examples, and this makes it easier for us to interpret images even when they are fuzzy.  This calls for a qualitatively different approach to machine learning that mimics human reasoning, and embodies 3D models of form and movement, as well as causal models of function and behaviour.  We need different kinds of artificial neural networks for this.

One way to get there is to use symbolic graphs + metadata to explore how plausible reasoning can be supplemented by metacognition along with different forms of learning, and to see whether this can be scaled to learning from large corpora, or whether to do so, we need to switch to neural networks that involve distributed versions of the symbolic algorithms.

Dave Raggett <dsr@w3.org>

Received on Friday, 21 October 2022 09:18:20 UTC