[voiceinteraction] updates to implied requirements document

From our last call on December 14

Implied Architecture Requirements
December 14, 2022

1
1.	Intelligent Personal Assistants (IPA's) MUST be able to provide general purpose information
2.	Specialized virtual assistants MUST be able to provide enterprise-specific information
3.	Specialized virtual assistants MAY be able to provide non-enterprise-specific information
4.	IPA's SHOULD be able to perform transactions
5.	Specialized assistants MUST be able to interoperate with general IPA's
6.	IPA's SHOULD be able to execute operations in a user's environment
7.	IPA's MUST be able to interact with users through voice or text (language?) or both.

2.1.1
1.	IPA's MUST be able to transfer a partially completed task to another IPA

3.0 Architecture

1.	The architecture SHOULD support question answering and information retrieval applications
2.	The architecture SHOULD support executing local services to accomplish tasks
3.	The architecture SHOULD support executing remote services to accomplish tasks
4.	The architecture MUST support dynamically adding local and remote services or knowledge sources.
5.	It MUST be possible to forward requests from one IPA to another with the same architecture
6.	It MUST be possible to forward requests or partial requests from one IPA to another with the same architecture, omitting the
client layer
7.	IPA extensions MAY be selected from a standardized marketplace 
8.	IPA's MAY include a Client layer
9.	IPA's MUST include a Dialog layer
10.	IPA's MAY include an API/Data layer
11.	Components MAY be shifted to other layers as needed (need to clarify with Dirk)?

3.1 
1.	The Client layer MAY include a microphone
2.	The Client layer MAY include a means for text input
3.	The Client layer MAY include a speaker
4.	The Client layer MAY include a display
5.	Additional (non-speech) output modalities MAY be employed to render output or to capture input

3.1.3

1.	The IPA Client MUST allow activation and deactivation by means of a Client Activation Strategy.
2.	As an extension IPA Clients MAY also capture input via text and output text.
3.	As an extension IPA Clients MAY also capture input from various modality recognizers.
4.	As an extension IPA Clients MAY also capture contextual information, e.g., location, time, environmental sounds or other
inputs that it obtains from Local Data Providers.
5.	As an extension an IPA Client MAY also receive commands to be executed locally in the Local Services.
6.	As an extension an IPA Client MAY also receive multimodal output to be rendered by a respective modality synthesizer.
7.	IPA Clients MAY reference a session identifier.
8.	Accessibility to be discussed

3.2.2.1
1.	The IPA Client MUST be activated with a Client Activation Strategy
2.	The Client Activation Strategy MAY be push-to-talk
3.	The Client Activation Strategy MAY be hotword
4.	The Client Activation Strategy MAY be triggered by an interpreted text string (either from audio or text)
5.	The Client Activation Strategy MAY be a change in environment
6.	The Client Activation Strategy MAY be triggered by a script or environmental condition
7.	The Client Activation Strategy MAY be a different strategy not enumerated here

3.2.2.2
1.	The IPA Client MUST include a Local Service Registry
2.	The Local Service Registry MUST maintain a list of Local Services
3.	The Local Service Registry MUST maintain a list of Local Data Providers

3.2 Dialog Layer

3.2.1 IPA Service
1.	The IPA Client SHOULD forward audio data and metadata (if any) to the IPA Service
2.	The IPA Client MAY forward audio data and metadata (if any) to the Dialog Manager
3.	The IPA Service MUST forward audio data and metadata (if any) to the Dialog Manager
4.	The IPA Service MUST forward audio data and metadata (if any) to the Local IPA
5.	The IPA Service MUST forward text data and metadata (if any) to the Dialog Manager
6.	The IPA Service MUST forward text data and metadata (if any) to the Local IPA
7.	The IPA Service MUST forward multimodal data and metadata (if any) to the Dialog Manager
8.	The IPA Service MUST forward multimodal data and metadata (if any) to the Local IPA

9.	The IPA Service MUST forward audio output from the TTS to the IPA Client
10.	The IPA Service MUST forward multimodal output from the Dialog Manager to the modality renders
11.	The IPA Service MUST forward text output from the NLG to the IPA Client

3.2.2 ASR
1.	The ASR MUST generate one or more recognition hypotheses from voice input that it receives from the IPA Service
2.	The ASR MAY associate recognition hypotheses with confidence scores
3.	The ASR MUST forward the recognition hypotheses to the NLU
4.	The ASR MAY update the History with the recognition hypotheses

3.2.3 NLU

1.	The NLU MUST extract textual interpretations from text strings (either from audio or text)
2.	The NLU MAY extract multiple interpretations from input text strings (either from audio or text)
3.	The NLU MUST be able to interpret input Core Intent Sets
4.	The NLU MUST be able to interpret spoken activation strategies that require interpretation, if they exist
5.	The NLU MAY make use of the Core Data Provider to access local or internal data or access external services. (revisit Core
Data Provider, are we still using that?)
6.	The NLU MAY make use of the Context to check for complementary information such as information in the history or knowledge
7.	The NLU MUST forward the semantic interpretation of the input to the Dialog Manager
8.	The NLU MAY associate statistical confidences with interpretations
9.	The NLU MAY extract emotion, intention, or sentiment from text strings either from audio or text)


3.2.4 Dialog Manager
1.	The Dialog Manager MUST recognize when the user goals are changed
2.	The Dialog Manager SHOULD confirm when the user goals are changed
3.	The Dialog Manager MUST consider ongoing workflows that must not be interrupted
4.	The Dialog Manager MAY update the History with dialog moves
5.	The Dialog Manager MUST determine the Dialog that is best suited to serve the current user input
6.	The Dialog Manager MUST receive the next dialog move as output from the selected Dialog or the IPA Service

7.	??The Dialog Manager makes use of the NLG to generate audio data to be rendered on the IPA Client
a.	This should be "generate text" I think

8.	The Dialog Manager MAY provide commands to be executed by the IPA Client or the External Services

3.2.5 Context
1.	The Context MAY make use of the Local Service Registry to include external knowledge from Local Data Providers
2.	The Context MAY make use of the Provider Selection Service to include external knowledge from Data Providers
3.	The Context MAY provide external knowledge temporarily to the Knowledge Graph to be considered in reasoning.

3.2.5.1 History
1.	The Dialog History MAY store the past dialog events per user.


3.3 API's/Data Layer

2.	The Provider Selection Service MAY receive input from the Dialog Manager to query data from Data Providers.
3.	The Provider Selection Service MAY receive input from the Dialog Manager to execute External Serives.
4.	If the Provider Selection Service is called with a preselected identifier of an IPA provider, it MUST use the preselected
provider
5.	If the Provider Selection Service is not called with a preselected identifier of an IPA provider, the Provider Selection
Service
6.	MUST follow a Provider Selection Strategy to determine those IPA Providers that are best suited to answer the request.

Received on Tuesday, 10 January 2023 19:36:54 UTC