- From: Dirk Schnelle-Walka <dirk@switch-consulting.de>
- Date: Mon, 22 Apr 2024 17:26:36 +0200
- To: public-voiceinteraction@w3.org
- Message-ID: <c5b11dab-cc91-4be1-b70b-53079c0602b4@switch-consulting.de>
Dear all,
So far I authored the code and it seems to be working, but I would be
interested in checking if this is all good enough by a code review. So,
if anybody is interested in this, please, let me know. I do not expect
anything here and will explain this in a joint review session.
I created a new section "Demo Code Walkthrough" at
https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md to
provide some more details about the demo progrma. I copied it here for
your convenience.
-----
Demo Code Walkthrough
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#demo-code-walkthrough>
The current demo aims at interacting with ChatGPT. As a first step you
will need to provide the correct developer key to communicate with ChatGPT.
Configuring Keys
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#configuring-keys>
As of now, everything is hard coded, you will need to replace your
OpenAI developer key in the file
w3c/voiceinteraction/ipa/reference/external/ipa/chatgpt/chatgptadapter.cpp
Replace|OPENAI-DEVELOPER-KEY|with your actual key
Take care not to commit while this key is in the source code
Main Program
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#main-program>
The main program starts with creating all the needed components per
layer as described inIntelligent Personal Assistant Interfaces
<https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paInterfaces/paInterfaces.htm>.
All components are created as shared instances, as they can potentially
be re-used in the employed processing chain.
Client Layer
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#client-layer>
On the client side, we mainly need the correct modality components, text
via|console|for now, a modality manager|modalityManager|to handle all
known modalities, and a component to select which input to forward to
the IPA. In this case, we simply select the first one that reaches us
via|inputListener|.
|std::shared_ptr<client::ModalityManager> modalityManager =
std::make_shared<client::ModalityManager>();
std::shared_ptr<::reference::client::ConsoleTextModalityComponent>
console =
std::make_shared<::reference::client::ConsoleTextModalityComponent>();
modalityManager->addModalityComponent(console);
std::shared_ptr<::reference::client::TakeFirstInputModalityComponentListener>
inputListener =
std::make_shared<::reference::client::TakeFirstInputModalityComponentListener>();
|
Dialog Layer
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#dialog-layer>
So far, we do not have an implementation of a dialog manager. However,
the IPA service|ipaService|is used to consume incoming calls from the
clients and provide the corresponding replies. For now, it will also
convert an error, e.g. ChatGPT cannot be reached to a user reply. Later,
this will be taken care of by the dialog mangager.
|std::shared_ptr<::reference::dialog::ReferenceIPAService> ipaService =
std::make_shared<::reference::dialog::ReferenceIPAService>(); |
External IPA / Services Layer
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#external-ipa--services-layer>
Here, we create an instance of an|IPAProvider|to communicate with
ChatGPT. This instance|chatGPT|is added to the list of known IPA
providers in the|registry|. The|providerSelectionStrategy|is used by
the|ProviderRegistry|to selects thos IPA providers that are suited to
handle the current request. In this case, we select all those that have
a matching modality, i.e. text.
|std::shared_ptr<::reference::external::providerselectionservice::ModalityMatchingProviderSelectionStrategy>
providerSelectionStrategy =
std::make_shared<::reference::external::providerselectionservice::ModalityMatchingProviderSelectionStrategy>();
std::shared_ptr<ProviderRegistry> registry =
std::make_shared<ProviderRegistry>(providerSelectionStrategy);
std::shared_ptr<IPAProvider> chatGPT =
std::make_shared<::reference::external::ipa::chatgpt::ChatGPTAdapter>();
registry->addIPAProvider(chatGPT);
std::shared_ptr<ProviderSelectionService> providerSelectionService =
std::make_shared<ProviderSelectionService>(registry); |
Create a Processing Chain
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#create-a-processing-chain>
FollowingIntelligent Personal Assistant Interfaces
<https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paInterfaces/paInterfaces.htm>we
then tie those needed components together.
|modalityManager >> inputListener >> ipaService >>
providerSelectionService >> ipaService >> modalityManager; |
Start
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#start>
Finally, we need to start capturing input and start processing in the IPA
|modalityManager->startInput(); inputListener->processIPAData(nullptr); |
Demo Output
<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#demo-output>
When running the program|w3cipademo|we may see the following on the screen
|User: What is the voice interaction community group? System: The Voice
Interaction Community Group (VoiceIG) is a group under the World Wide
Web Consortium (W3C) that focuses on promoting and enabling the use of
voice technology on the web. This community group aims to facilitate
discussions, share best practices, and collaborate on standards and
guidelines related to voice interactions on the web. The group is open
to anyone interested in voice technology, including developers,
designers, researchers, and other stakeholders in the industry.|
-----
Dirk
Received on Monday, 22 April 2024 15:26:26 UTC