[voiceinteraction] minutes April 24, 2024 from Deborah Dahl on 2024-04-24 (public-voiceinteraction@w3.org from April 2024)

From: Deborah Dahl <Dahl@conversational-Technologies.com>
Date: Wed, 24 Apr 2024 13:33:52 -0400
To: <public-voiceinteraction@w3.org>
Message-ID: <02bd01da966d$ad861c20$08925460$@conversational-Technologies.com>

https://www.w3.org/2024/04/24-voiceinteraction-minutes.html
and below as text

Note that next time we'll continue this discussion and talk about the provider selection strategy and how to chain everything
together.

[1]W3C

[1] https://www.w3.org/

- DRAFT -
Voice Interaction

24 April 2024

[2]Agenda. [3]IRC log.

[2] https://lists.w3.org/Archives/Public/public-voiceinteraction/2024Apr/0010.html
[3] https://www.w3.org/2024/04/24-voiceinteraction-irc

Attendees

Present
debbie, dirk, gerard

Regrets
-

Chair
debbie

Scribe
ddahl

Contents

1. [4]reference implementation

Meeting minutes

reference implementation

[5]https://github.com/w3c/voiceinteraction/tree/master/source/
w3cipa

[5] https://github.com/w3c/voiceinteraction/tree/master/source/w3cipa

dirk: review reference implementation
. ChatGPT and Mistral
. framework is mainly headers
. component that accesses GPT, also demo program

dirk reviews SOURCE.md

input listener for modality inputs
. in this case just selects the first one
. goes to ModalityManager
. can add modality components as you like
. startInput and handleOutput
. this is part of the framework, so Royalty Free
. modality type is a free string so it can be extensible
. only text is implemented in the reference implementation
. some modality components could be both input and output
. one instance that knows all listeners and that all modality
components would know
. looking at one example of a modality component, textModality

debbie: can there be more than one InputModalityComponent?

dirk: in theory, yes
. we might have scaling issues with multiple text inputs, for
example

debbie: take "first" out of name
"TakeFirstInputModalityComponent" to make it more general

dirk: moving on to DialogLayer, IPA Service
. IPA for both local and anything else we have
. ReferenceIPAService consumes data from Client
. could serve multiple clients or if we have local and other
IPA services
. no DialogManager in place
. if there was one, the IPA service would send the input to it
and then after that the IPA service would forward the output
back to the client
. the ExternalIPA/Provider Selection Service
. the Provider Selection Service for now only knows about
ChatGPT
. IPA provider supports input from different modalities

debbie: should we standardize on define modality types, e.g.
"voice" vs "speech"

dirk: would like to talk about ProviderSelectionStrategy and
how components are glued together

debbie: we can talk more in the next call
. could we list the parts of the architecture that aren't
implemented yet?

dirk. that might make sense

debbie: could there be an UML diagram?

dirk: there could be more diagrams
. could link from code to specification

dirk: next time talk about the provider selection strategy and
how to chain everything together

debbie: will try running

dirk: demo running with ChatGPT

gerard: which version of Mixtral do you use?
. open source version

hugues: the next version will not be open source

gerard: the approach is mixture of experts

dirk: what happens if we ask both at the same time?
. would receive them both

gerard: could use an LLM to summarize
. that's what Mixtral is using with the Mixture of Experts

Minutes manually created (not a transcript), formatted by
[6]scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).

[6] https://w3c.github.io/scribe2/scribedoc.html

Received on Wednesday, 24 April 2024 17:34:40 UTC