[voiceinteraction] updated implied requirements for discussion on November 30 call

Implied Architecture Requirements
November 30, 2022

1
1. Intelligent Personal Assistants (IPA's) MUST be able to provide general purpose information
2. Specialized virtual assistants MUST be able to provide enterprise-specific information
3. Specialized virtual assistants MAY be able to provide non-enterprise-specific information
4. IPA's SHOULD be able to perform transactions
5. Specialized assistants MUST be able to interoperate with general IPA's
6. IPA's SHOULD be able to execute operations in a user's environment
7. IPA's MUST be able to interact with users through voice or text (language?) or both.

2.1.1
1. IPA's MUST be able to transfer a partially completed task to another IPA

3.0 Architecture

1. The architecture SHOULD support question answering and information retrieval applications
2. The architecture SHOULD support executing local services to accomplish tasks
3. The architecture SHOULD support executing remote services to accomplish tasks
4. The architecture MUST support dynamically adding local and remote services or knowledge sources.
5. It MUST be possible to forward requests from one IPA to another with the same architecture
6. It MUST be possible to forward requests or partial requests from one IPA to another with the same architecture, omitting the
client layer
7. IPA extensions MAY be selected from a standardized marketplace 
8. IPA's MAY include a Client layer
9. IPA's MUST include a Dialog layer
10. IPA's MAY include an API/Data layer
11. Components MAY be shifted to other layers as needed (need to clarify with Dirk)?

3.1 
1. The Client layer MAY include a microphone
2. The Client layer MAY include a means for text input
3. The Client layer MAY include a speaker
4. The Client layer MAY include a display
5. Additional (non-speech) output modalities MAY be employed to render output or to capture input

3.1.3

1. The IPA Client MUST allow activation and deactivation by means of a Client Activation Strategy.
2. As an extension IPA Clients MAY also capture input via text and output text.
3. As an extension IPA Clients MAY also capture input from various modality recognizers.
4. As an extension IPA Clients MAY also capture contextual information, e.g., location, time, environmental sounds or other inputs
that it obtains from Local Data Providers.
5. As an extension an IPA Client MAY also receive commands to be executed locally in the Local Services.
6. As an extension an IPA Client MAY also receive multimodal output to be rendered by a respective modality synthesizer.
7. IPA Clients MAY reference a session identifier.
8. Accessibility to be discussed

3.2.2.1
1. The IPA Client MUST be activated with a Client Activation Strategy
2. The Client Activation Strategy MAY be push-to-talk
3. The Client Activation Strategy MAY be hotword
4. The Client Activation Strategy MAY be a change in environment
5. The Client Activation Strategy MAY be triggered by a script or environmental condition
6. The Client Activation Strategy MAY be a different strategy not enumerated here

3.2.2.2
1. The IPA Client MUST include a Local Service Registry
2. The Local Service Registry MUST maintain a list of Local Services
3. The Local Service Registry MUST maintain a list of Local Data Providers

3.2 Dialog Layer

3.2.1 IPA Service
1. The IPA Client SHOULD forward audio data and metadata (if any) to the IPA Service
2. The IPA Client MAY forward audio data and metadata (if any) to the Dialog Manager
3. The IPA Service MUST forward audio data and metadata (if any) to the Dialog Manager
4. The IPA Service MUST forward audio data and metadata (if any) to the Local IPA
5. The IPA Service MUST forward text data and metadata (if any) to the Dialog Manager
6. The IPA Service MUST forward text data and metadata (if any) to the Local IPA
7. The IPA Service MUST forward multimodal data and metadata (if any) to the Dialog Manager
8. The IPA Service MUST forward multimodal data and metadata (if any) to the Local IPA

9. The IPA Service MUST forward audio output from the TTS to the IPA Client
10. The IPA Service MUST forward multimodal output from the Dialog Manager to the modality renders
11. The IPA Service MUST forward text output from the NLG to the IPA Client

3.2.2 ASR
1. The ASR MUST generate one or more recognition hypotheses from voice input that it receives from the IPA Service
2. The ASR MAY associate recognition hypotheses with confidence scores
3. The ASR MUST forward the recognition hypotheses to the NLU
4. The ASR MAY update the History with the recognition hypotheses

3.2.3 NLU

1. The NLU MUST extract interpretations from text strings
2. The NLU MUST be able to interpret Core Intent Sets
3. The NLU MAY make use of the Core Data Provider to access local or internal data or access external services.
4. The NLU MAY make use of the Context to check for complementary information
5. The NLU MUST forward the semantic input to the Dialog Manager
6. The NLU MAY generate multiple interpretations from input text strings
7. The NLU MAY associate confidences with interpretations

3.2.4 Dialog Manager
1. The Dialog Manager MUST fill in all known slots before prompting the user for additional slots
2. The Dialog Manager MUST the best suited input from the available input alternatives for further processing
3. The Dialog Manager MUST expect that the user may switch the goals at any time
4. The Dialog Manager MUST consider ongoing workflows that must not be interrupted
5. The Dialog Manager MAY update the History with dialog moves
6. The Dialog Manager MUST determine the Dialog that is best suited to serve the current user input
7. The Dialog Manager MUST receive the next dialog move as output from the selected Dialog or the IPA Service

8. ??The Dialog Manager makes use of the NLG to generate audio data to be rendered on the IPA Client
a. This should be "generate text" I think

9. The Dialog Manager MAY provide commands to be executed by the IPA Client or the External Services

3.2.5 Context
1. The Context MAY make use of the Local Service Registry to include external knowledge from Local Data Providers
2. The Context MAY make use of the Provider Selection Service to include external knowledge from Data Providers
3. The Context MAY provide external knowledge temporarily to the Knowledge Graph to be considered in reasoning.

3.2.5.1 History
1. The Dialog History MAY store the past dialog events per user.


3.3 API's/Data Layer

2. The Provider Selection Service MAY receive input from the Dialog Manager to query data from Data Providers.
3. The Provider Selection Service MAY receive input from the Dialog Manager to execute External Services.
4. If the Provider Selection Service is called with a preselected identifier of an IPA provider, it MUST use the preselected
provider
5. If the Provider Selection Service is not called with a preselected identifier of an IPA provider, the Provider Selection Service
6. MUST follow a Provider Selection Strategy to determine those IPA Providers that are best suited to answer the request.

Received on Tuesday, 29 November 2022 21:20:26 UTC