[voiceinteraction] updates to implied requirements

For discussion tomorrow.
Changes that we added during the November 2 call are starred.
We'll resume discussing at Section 3.1.3, number 1

Implied Architecture Requirements
November 16, 2022

1.	Intelligent Personal Assistants (IPA's) MUST be able to provide general purpose information
2.	Specialized virtual assistants MUST be able to provide enterprise-specific information
3.	Specialized virtual assistants MAY be able to provide non-enterprise-specific information
4.	IPA's SHOULD be able to perform transactions
5.	Specialized assistants MUST be able to interoperate with general IPA's
6.	IPA's SHOULD be able to execute operations in a user's environment
7.	IPA's MUST be able to interact with users through voice or text (language?) or both.

2.1.1
1.	IPA's MUST be able to transfer a partially completed task to another IPA

3.0 Architecture

1.	The architecture SHOULD support question answering and information retrieval applications
2.	The architecture SHOULD support executing local services to accomplish tasks
3.	The architecture SHOULD support executing remote services to accomplish tasks
4.	The architecture MUST support dynamically adding local and remote services or knowledge sources.
5.	It MUST be possible to forward requests from one IPA to another with the same architecture
6.	It MUST be possible to forward requests or partial requests from one IPA to another with the same architecture, omitting the
client layer
7.	IPA extensions MAY be selected from a standardized marketplace 
8.	* IPA's MAY include a Client layer (change from MUST)
9.	IPA's MUST include a Dialog layer
10.	IPA's MAY include an API/Data layer
11.	Components MAY be shifted to other layers as needed (need to clarify with Dirk)?

3.1 
1.	The Client layer MAY include a microphone
2.	The Client layer MAY include a means for text input
3.	The Client layer MAY include a speaker
4.	The Client layer MAY include a display
5.	Additional (non-speech) output modalities MAY be employed to render output or to capture input

3.1.3

1.	*The IPA Client MUST allow activation and deactivation by means of a Client Activation Strategy. (added "deactivation")
2.	As an extension IPA Clients MAY also capture input via text and output text.
3.	As an extension IPA Clients MAY also capture input from a specific modality recognizer.
4.	As an extension IPA Clients MAY also capture contextual information, e.g. location, that it obtains from Local Data
Providers.
5.	As an extension an IPA Client MAY also receive commands to be executed locally in the Local Services.
6.	As an extension an IPA Client MAY also receive multimodal output to be rendered by a respective modality synthesizer.
7.	IPA Clients MAY reference a session identifier.

3.2.2.1
1.	The IPA Client MUST be activated with a Client Activation Strategy
2.	The Client Activation Strategy MAY be push-to-talk
3.	The Client Activation Strategy MAY be hotword
4.	The Client Activation Strategy MAY be a change in environment
5.	The Client Activation Strategy MAY be a different strategy not enumerated here

3.2.2.2
1.	The IPA Client MUST include a Local Service Registry
2.	The Local Service Registry MUST maintain a list of Local Services
3.	The Local Service Registry MUST maintain a list of Local Data Providers

3.2 Dialog Layer

3.2.1 IPA Service
1.	The IPA Client SHOULD forward audio data and metadata (if any) to the IPA Service
2.	The IPA Client MAY forward audio data and metadata (if any) to the Dialog Manager
3.	The IPA Service MUST forward audio data and metadata (if any) to the Dialog Manager
4.	The IPA Service MUST forward audio data and metadata (if any) to the Local IPA
5.	The IPA Service MUST forward text data and metadata (if any) to the Dialog Manager
6.	The IPA Service MUST forward text data and metadata (if any) to the Local IPA
7.	The IPA Service MUST forward multimodal data and metadata (if any) to the Dialog Manager
8.	The IPA Service MUST forward multimodal data and metadata (if any) to the Local IPA

9.	The IPA Service MUST forward audio output from the TTS to the IPA Client
10.	The IPA Service MUST forward multimodal output from the Dialog Manager to the modality renders
11.	The IPA Service MUST forward text output from the NLG to the IPA Client

3.2.2 ASR
The ASR MUST generate one or more recognition hypotheses from voice input that it receives from the IPA Service
The ASR MAY associate recognition hypotheses with confidence scores
The ASR MUST forward the recognition hypotheses to the NLU
The ASR MAY update the History with the recognition hypotheses

3.2.3 NLU

The NLU MUST extract interpretations from text strings
The NLU MUST be able to interpret Core Intent Sets
The NLU MAY make use of the Core Data Provider to access local or internal data or access external services.
The NLU MAY make use of the Context to check for complementary information
The NLU MUST forward the semantic input to the Dialog Manager
The NLU MAY generate multiple interpretations from input text strings
The NLU MAY associate confidences with interpretations

3.2.4 Dialog Manager
The Dialog Manager MUST fill in all known slots before prompting the user for additional slots
The Dialog Manager MUST the best suited input from the available input alternatives for further processing
The Dialog Manager MUST expect that the user may switch the goals at any time
The Dialog Manager MUST consider ongoing workflows that must not be interrupted
The Dialog Manager MAY update the History with dialog moves
The Dialog Manager MUST determine the Dialog that is best suited to serve the current user input
The Dialog Manager MUST receive the next dialog move as output from the selected Dialog or the IPA Service

??The Dialog Manager makes use of the NLG to generate audio data to be rendered on the IPA Client
	This should be "generate text" I think

The Dialog Manager MAY provide commands to be executed by the IPA Client or the External Services

3.2.5 Context
The Context MAY make use of the Local Service Registry to include external knowledge from Local Data Providers
The Context MAY make use of the Provider Selection Service to include external knowledge from Data Providers
The Context MAY provide external knowledge temporarily to the Knowledge Graph to be considered in reasoning.

3.2.5.1 History
The Dialog History MAY store the past dialog events per user.


3.3 API's/Data Layer

The Provider Selection Service MAY receive input from the Dialog Manager to query data from Data Providers.
The Provider Selection Service MAY receive input from the Dialog Manager to execute External Serives.
If the Provider Selection Service is called with a preselected identifier of an IPA provider, it MUST use the preselected provider
If the Provider Selection Service is not called with a preselected identifier of an IPA provider, the Provider Selection Service
MUST follow a Provider Selection Strategy to determine those IPA Providers that are best suited to answer the request.

Received on Tuesday, 15 November 2022 15:38:09 UTC