Demo Code Walkthrough & reviewerrs wanted from Dirk Schnelle-Walka on 2024-04-22 (public-voiceinteraction@w3.org from April 2024)

From: Dirk Schnelle-Walka <dirk@switch-consulting.de>
Date: Mon, 22 Apr 2024 17:26:36 +0200
To: public-voiceinteraction@w3.org
Message-ID: <c5b11dab-cc91-4be1-b70b-53079c0602b4@switch-consulting.de>
Dear all,

So far I authored the code and it seems to be working, but I would be 
interested in checking if this is all good enough by a code review. So, 
if anybody is interested in this, please, let me know. I do not expect 
anything here and will explain this in a joint review session.

I created a new section "Demo Code Walkthrough" at 
https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md to 
provide some more details about the demo progrma. I copied it here for 
your convenience.

-----


    Demo Code Walkthrough

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#demo-code-walkthrough>

The current demo aims at interacting with ChatGPT. As a first step you 
will need to provide the correct developer key to communicate with ChatGPT.


      Configuring Keys

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#configuring-keys>

As of now, everything is hard coded, you will need to replace your 
OpenAI developer key in the file 
w3c/voiceinteraction/ipa/reference/external/ipa/chatgpt/chatgptadapter.cpp

Replace|OPENAI-DEVELOPER-KEY|with your actual key

Take care not to commit while this key is in the source code


      Main Program

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#main-program>

The main program starts with creating all the needed components per 
layer as described inIntelligent Personal Assistant Interfaces 
<https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paInterfaces/paInterfaces.htm>.

All components are created as shared instances, as they can potentially 
be re-used in the employed processing chain.


        Client Layer

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#client-layer>

On the client side, we mainly need the correct modality components, text 
via|console|for now, a modality manager|modalityManager|to handle all 
known modalities, and a component to select which input to forward to 
the IPA. In this case, we simply select the first one that reaches us 
via|inputListener|.

|std::shared_ptr<client::ModalityManager> modalityManager = 
std::make_shared<client::ModalityManager>(); 
std::shared_ptr<::reference::client::ConsoleTextModalityComponent> 
console = 
std::make_shared<::reference::client::ConsoleTextModalityComponent>(); 
modalityManager->addModalityComponent(console); 
std::shared_ptr<::reference::client::TakeFirstInputModalityComponentListener> 
inputListener = 
std::make_shared<::reference::client::TakeFirstInputModalityComponentListener>(); 
|


        Dialog Layer

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#dialog-layer>

So far, we do not have an implementation of a dialog manager. However, 
the IPA service|ipaService|is used to consume incoming calls from the 
clients and provide the corresponding replies. For now, it will also 
convert an error, e.g. ChatGPT cannot be reached to a user reply. Later, 
this will be taken care of by the dialog mangager.

|std::shared_ptr<::reference::dialog::ReferenceIPAService> ipaService = 
std::make_shared<::reference::dialog::ReferenceIPAService>(); |


        External IPA / Services Layer

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#external-ipa--services-layer>

Here, we create an instance of an|IPAProvider|to communicate with 
ChatGPT. This instance|chatGPT|is added to the list of known IPA 
providers in the|registry|. The|providerSelectionStrategy|is used by 
the|ProviderRegistry|to selects thos IPA providers that are suited to 
handle the current request. In this case, we select all those that have 
a matching modality, i.e. text.

|std::shared_ptr<::reference::external::providerselectionservice::ModalityMatchingProviderSelectionStrategy> 
providerSelectionStrategy = 
std::make_shared<::reference::external::providerselectionservice::ModalityMatchingProviderSelectionStrategy>(); 
std::shared_ptr<ProviderRegistry> registry = 
std::make_shared<ProviderRegistry>(providerSelectionStrategy); 
std::shared_ptr<IPAProvider> chatGPT = 
std::make_shared<::reference::external::ipa::chatgpt::ChatGPTAdapter>(); 
registry->addIPAProvider(chatGPT); 
std::shared_ptr<ProviderSelectionService> providerSelectionService = 
std::make_shared<ProviderSelectionService>(registry); |


        Create a Processing Chain

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#create-a-processing-chain>

FollowingIntelligent Personal Assistant Interfaces 
<https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paInterfaces/paInterfaces.htm>we 
then tie those needed components together.

|modalityManager >> inputListener >> ipaService >> 
providerSelectionService >> ipaService >> modalityManager; |


        Start

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#start>

Finally, we need to start capturing input and start processing in the IPA

|modalityManager->startInput(); inputListener->processIPAData(nullptr); |


        Demo Output

<https://github.com/w3c/voiceinteraction/blob/master/source/SOURCE.md#demo-output>

When running the program|w3cipademo|we may see the following on the screen

|User: What is the voice interaction community group? System: The Voice 
Interaction Community Group (VoiceIG) is a group under the World Wide 
Web Consortium (W3C) that focuses on promoting and enabling the use of 
voice technology on the web. This community group aims to facilitate 
discussions, share best practices, and collaborate on standards and 
guidelines related to voice interactions on the web. The group is open 
to anyone interested in voice technology, including developers, 
designers, researchers, and other stakeholders in the industry.|

-----

Dirk
Received on Monday, 22 April 2024 15:26:26 UTC