Re: Comments on Pronunciation Technical Approach from Alan Reeve on 2021-03-10 (public-pronunciation@w3.org from March 2021)

From: Alan Reeve <alan.reeve@cambiumassessment.com>
Date: Wed, 10 Mar 2021 17:02:42 +0000
To: Paul Grenier <pgrenier@gmail.com>, "public-pronunciation@w3.org" <public-pronunciation@w3.org>
Message-ID: <648bfa95a2cd41478bf9e4d9f943c768@cambiumassessment.com>
Here's my feedback after working through https://w3c.github.io/pronunciation/technical-approach/#data-sssml-voice :


- Thanks, Alan



3.1.1 - Seems like a say-as type for measurements is absent e.g. <span data-ssml-say-as='measurement'>3m</span> should read as '3 meters' (or perhaps the m should be by itself) [We are having issues with one vendor auto reading such text as a measurement when most do not] I see SSML supports this naming it 'unit' (and not 'measurement'). Seems like we should just mirror all of the SSML options for that?

  Typo 'deatil' instead of 'detail'

  Example has say-as 'digits' but digits isn't in the list of Values?

3.1.4 - Typo extra s in sssml

  How does data-ssml-voice-name match a voice pack name? Does it have to be exact? It seems like some vendors are changing voice pack names all the time. Perhaps we should allow for substring matching e.g. 'David' would pick up 'Microsoft David' or 'Microsoft David (English)'.

  Seems like there should be a data-ssml-voice-culture='es-MX' type attribute when we don't care what voice is being used other than ensuring it is appropriate for the culture unless the existing lang attribute will suffice (in which case perhaps that should be explicitly stated as the way to do that).

3.1.6 We should be clear what the valid time suffixes are e.g. s=seconds and ms=milliseconds but h (hours) isn't valid

3.1.8 I am unclear as to what data-ssml-audio-fetchint means... fetch interval?? Actually I am unclear as to what valid values for many of these would be. (I was able to find more info in the SSML docs as to what I gather the intent of many of these is)

4 In Example 1 shouldn't the JSON {"prosody":{"rate":"slow";"pitch":"low"}} be separated by a command and not a semi-colon?

Also, what happens if multiple attribute are used in one data-ssml e.g. data-ssml = '{"audio":{"src":"/soundlibrary/wood/hits/hits_11"}, {"break":{"time":"500ms"}}' ? That's valid JSON, but is it valid for our purposes?

If so then is there precedence? The same could be asked with the multi-attribute approach e.g. if I use say-as and audio on the same span then what do I hear? Both? Audio (if available) first?



There is quite a bit of redundancy in the explanations of what single and multi attribute attributes do... instead perhaps have the single attribute version focus on providing a clear mapping and leave the detailed doc in the multi-attribute version? Ideally it would be nice if the mapping was 1:1, but I see that is not the case due to a few exceptions.

I see a reference to QTIs use of SSML... might it not be worthwhile to note that web speech has also adopted SSML as its standard for marked up TTS?

This statement concerns me: "Implementers must decide how to handle malformed JSON.". Seems like that should be part of this spec e.g. if the JSON isn't valid then it should be ignored... no attempt to do a hybrid interpretation should be made otherwise we risk vendor incompatibilities.

Perhaps a section on mapping to/from SSML would be helpful especially for implementers that will take this markup, convert to SSML and then feed it on to say a web speech engine that fully supports SSML.


________________________________
From: Paul Grenier <pgrenier@gmail.com>
Sent: Wednesday, March 10, 2021 9:32:28 AM
To: public-pronunciation@w3.org
Subject: Re: Comments on Pronunciation Technical Approach

Notes from team (may need some rewrite/clarification before becoming github issues)


  *   Abstract should be about the document, not the TF, and more detail about why the doc exists.

No objections. Will correct.

  *   In the intro, the paragraph beginning "This proposal represents a decision point..." is not clear, it seems to jump subject a lot.

Need to rewrite 3rd para. into a hanger and list or table/matrix.

  *   I'm looking for, somewhere in the intro and possibly this para, a clear indication that there are two approaches being considered, but that at this point neither are explicitly on path to Recommendation, we actively want feedback on them in order to choose. I think this is more than a paragraph.

follow matrix with call to action requesting comments. to help move one of these approaches to technical recommendation. feedback from implementors and authors. (can or can't implement)

  *   The section "Background on Pronunciation" has only three sentences. That is not enough to justify a section. I think the section is needed, so it should have more content. While there are links, it would be important to summarize the content of the links so a reader can a) be introduced without having to follow links and b) understand why they might want to follow those links for more detail.

hanger and a list here. TLDR each doc.

  *   I find myself completely disoriented starting to read the section Multi-attribute Approach. I can't tell what it is, and how it relates to the introductory material I've already read. Partly the Intro needs expansion, but partly this section needs more than a paragraph at the top.

Link to the section from the intro. Repeat some of the intro in this section to help re-iterate/contextualize.

  *   Jumping straight into a code example also threw me off. Not having read the section yet, I couldn't tell what parts of the example to focus on, nor what I might learn from it prior to reading the rest of the section.

more exposition about the code example(s) to help the reader understand the significance and techniques. Maybe a pros/cons and summary distinctions following both technical descriptions. Not just a visual presenation, affects aural presentation. Show the stanza. Comparison TTS and natural reading of the text. (need human dramatic performance) https://www.gutenberg.org/files/23901/mp3/23901-01.mp3


  *   Most of the sections describings specific attributes, or whatevers the subsections are in the single-attribute approach, need content explaining what the feature does. Saying they exist and a value template is not enough information for people to understand their role in the model.

maybe visually diagram the SSML and/or table of what the functions do. link to SSML 1.1 specification.

  *   There should be a pros / cons section, either one in each approach, or a section following them comparing them and giving pros and cons as we understand them.

borrow heavily from previous documents (explainer).

--
Paul Grenier
[github]<https://github.com/AutoSponge>[twitter]<https://twitter.com/AutoSponge>[linkedin]<http://www.linkedin.com/in/pgrenier>


On Wed, Mar 10, 2021 at 9:03 AM Roy Ran <ran@w3.org<mailto:ran@w3.org>> wrote:

FYI, Comments from Michael.


-------- 转发的消息 --------
主题:     Comments on Pronunciation Technical Approach
重新发送日期:         Mon, 08 Mar 2021 22:22:21 +0000
Resent-From:    group-apa-chairs@w3.org<mailto:group-apa-chairs@w3.org>
日期:     Mon, 8 Mar 2021 17:22:18 -0500
From:   Michael Cooper <cooper@w3.org><mailto:cooper@w3.org>
收件人:    group-apa-chairs@w3.org<mailto:group-apa-chairs@w3.org> <group-apa-chairs@w3.org><mailto:group-apa-chairs@w3.org>



  *   Abstract should be about the document, not the TF, and more detail about why the doc exists.
  *   In the intro, the paragraph beginning "This proposal represents a decision point..." is not clear, it seems to jump subject a lot.
  *   I'm looking for, somewhere in the intro and possibly this para, a clear indication that there are two approaches being considered, but that at this point neither are explicitly on path to Recommendation, we actively want feedback on them in order to choose. I think this is more than a paragraph.
  *   The section "Background on Pronunciation" has only three sentences. That is not enough to justify a section. I think the section is needed, so it should have more content. While there are links, it would be important to summarize the content of the links so a reader can a) be introduced without having to follow links and b) understand why they might want to follow those links for more detail.
  *   I find myself completely disoriented starting to read the section Multi-attribute Approach. I can't tell what it is, and how it relates to the introductory material I've already read. Partly the Intro needs expansion, but partly this section needs more than a paragraph at the top.
  *   Jumping straight into a code example also threw me off. Not having read the section yet, I couldn't tell what parts of the example to focus on, nor what I might learn from it prior to reading the rest of the section.
  *   Most of the sections describings specific attributes, or whatevers the subsections are in the single-attribute approach, need content explaining what the feature does. Saying they exist and a value template is not enough information for people to understand their role in the model.
  *   There should be a pros / cons section, either one in each approach, or a section following them comparing them and giving pros and cons as we understand them.

Michael
Received on Wednesday, 10 March 2021 17:03:01 UTC