Re: Text to Image and human cognition

Love your work...

Yet, I think it's vitally important to triage distinctions between human
actors & related wilful acts (even if foolish); vs, active artificial
agents; and,

I have no good framework considered about how to do so.

I do have a few ideas, yet, I don't know how to protect the sanctity of
them well enough.

There's big challenges for communities like W3C; and I don't know how
they'll be addressed.

I think much of what you're speaking of, is valid.  I also think open
standards are essential for ensuring support for a better future. But that
doesn't mean it's all ok now.

Nor will it be, if we're unable to address foundational problems as some
people go about marketing different sorts of internet protocols as an
intended form of "the web", making money without much support for triage
(ie: HTTPA); or most often also, any, any at all - practical knowledge
about what their - selling.  Meanwhile, the "former team" I depended upon
(both re: work, but also in life - given w3c commitments shifted me, to
work at night - I'm Australian; noting, perhaps the linked behaviours have
had an impact on effectively excluding people in Asia generally?  Sizable
population...  Their human rights not merely protected by the issuance of a
public key);

Now therefore; whilst probably out of scope, I'd really like some help in
seeking to ensure these "wallets" that are proposed to "define our identity
- as human beings" enable a capacity for people, to issue values statement
(perhaps like open badges?) Such as the UDHR that means, when engaging in
contract law (perhaps with immoral entities / suppliers?) They're able to
issue rights (to rulers) that will effect whether or not those contracts
are in good standing or void (perhaps, it'll be like Bitcoin, for lawyers,
idk).

So the cogai relevance, is about forming agents that can support these
sorts of "human centric" considerations, about our future - as a
species....

The comment about "it may not be fair, but you should use it anyway", isn't
good enough.  Perhaps it's a resourcing issue that can be addressed,
subject to ideology.

Or philosophical Engineering principles.

Cheers,

Tim.h. (aka; Timo).

On Thu, 20 Oct 2022, 2:56 am Dave Raggett, <dsr@w3.org> wrote:

> Current text to image generators are mind blowing in their capabilities to
> mimic a vast range of photos and artwork, seemingly by magic. The release
> of Stable Diffusion has made it practical to run text to image generators
> on the latest laptops, but you can also experiment with it online at:
>
> https://huggingface.co/spaces/stabilityai/stable-diffusion
>
> It was trained on a large dataset of image+text pairs scraped from the
> Internet using the HTML IMG element and its ALT attribute for the text
> descriptions.  In essence, text prompts are first mapped to a language
> embedding based upon GPT. This is further transformed into a latent model
> for images and combined with noise. The model is then diffused and denoised
> in a series of steps that fills out the details, using the prior knowledge
> from the dataset, and finally decoded to create the pixels in the resulting
> image.
>
> My vague understanding of the training process is that it involves a
> generative/adversarial approach that tries to predict whether an image is
> machine or human generated, and whether a text prompt matches or doesn’t
> match a given image. The resulting model is about 4GB in size, which is
> surprising small given the huge breadth of images covered.
>
> Stable Diffusion is great when it comes to backgrounds, and for close ups
> of faces, but has a tendency to make bizarre errors with hands, fingers and
> arms, as well as failing to provide sufficient details for faces for
> figures that are not the main focus of the composition.  Animals also often
> come out weirdly, so generating aesthetically pleasing images is a measure
> of good luck and a good prompt, see:
>
> https://www.unite.ai/three-challenges-ahead-for-stable-diffusion/
>
> This is perhaps unsurprising given the neural network architecture.  There
> is ongoing work on extracting 3D models from small sets of 2D images. In
> principle, this should extend to inferring likely 3D models from single
> images of human faces and bodies, however, this will also require the
> generator to pay extra attention to things that people are especially
> attuned to. Existing image to image software can already remove noise,
> increase image resolution, colourise monochrome images, render faces in
> changed orientations, and make people look younger or older than in the
> original image.
>
> I anticipate that future text to image generators will include a rich
> grasp of everyday knowledge and support collaborative dialogues for
> creating and refining artworks as an iterative process. Commercial artists
> will become experts at doing this, combining the computer's imagination
> with the human creative spark and intuitive understanding of emotions, etc.
>
> I am now wondering about how to combine artificial neural networks with
> human-like reasoning and learning. This involves combining everyday
> knowledge with working memory, and providing a means to support sequential
> cognition in terms of sequences of inference steps, rather than simple
> associations. Humans learn a lot from thinking about explanations for what
> they observe, so in principle, we should be able to mimic that, and enable
> computers to learn effectively from understanding texts and videos.
>
> This raises questions about how to design artificial neural networks to
> replicate plausible reasoning, e.g. how to support variables, queues, and
> sets, as well as how to mimic multiple inference strategies and
> metacognition for controlling them. Current neural networks are designed
> for single purposes, rather than general purpose cognition, so some fresh
> ideas are likely to be needed.
>
> p.s. an open question is whether an extended form of copyright is needed,
> given that text to image generators are very good at mimicking the style of
> popular artists rather than copying their artworks. In principle, I can see
> a rationale for artists and photographers, etc. having to give their
> explicit permission for AI based systems to be trained using their creative
> works.
>
> Dave Raggett <dsr@w3.org>
>
>
>
>

Received on Thursday, 20 October 2022 17:26:31 UTC