- From: HINARD Edouard FTRD/DIH/LAN <edouard.hinard@rd.francetelecom.com>
- Date: Wed, 18 Feb 2004 15:11:26 +0100
- To: <www-voice@w3.org>, "Baggia Paolo" <Paolo.Baggia@LOQUENDO.COM>, "Daniel Burnett (E-mail)" <burnett@nuance.com>, "Jim Larson (E-mail)" <jim@larson-tech.com>
- Cc: "DIH-IPS-VMI" <l-dih-ips-vmi@rd.francetelecom.com>, "PETIT Jean-Pierre FTRD/DIH/LAN" <jpp.petit@francetelecom.com>, "GAVIGNET Frederic FTRD/DIH/LAN" <frederic.gavignet@francetelecom.com>
- Message-ID: <8AA97249241F7148BE6D3D8B93D83F5AFE7920@ftrdmel2.rd.francetelecom.fr>
As a global telecommunications carrier, the France Telecom group believes that the SSML 1.0 Candidate Recommendation establishes a comprehensive solution facilitating the contribution to projects with technologies that will be part of many people's daily life in the near future. Committed to customer care, responsibility and Innovation, France Telecom is therefore happy to contribute to this Recommendation by submitting the following SSML 1.0 Implementation Report and to support the activities of the W3C Voice Browser working group. The intrinsic behaviour of France Telecom Research & Development SSML synthesizer is a complete processing at SSML input even when an error occurs; the embeded XML parser does not validate XML input against neither synthesis schema nor SSML DTD. This is why a special testers version (consisting of a front-end validator performing DTD validation and a standard SSML synthesizer) has been used. This special version also inserts extra samples in the speech signal when a marker is encountered in order to trigger an audo event, audible by testers, at the exact position when marker occurs. The FTR&D implementation was run on the official test set ssml-ir-20040119.zip with following modifications applied on manifest.xml: * The addition of a dep element for TA#51: <dep uri="51/turca.wav" type="audio/x-wav"/> * The renaming of the 15 uri of the form "test-prosody-XXX-%25-NN.txml" which doesn't correspond to any .txml filenames in ZIP archive to "test-prosody-XXX-%-NN.txml" (without substring "25") and with the following modifications applied on .txml files: * 79/79.txml The removal of xml:lang attribute from speak element * 91/91.txml and 92/92.txml The replacement of sentence by s * 27/27.txml The replacement of paragraph by p * 301/test-prosody-rate-comp-115.txml The addition of xml:lang="en-US" to speak * 298/ta_298.txml and 299/ta_299.txml The modification of generated .ssml in order to obtain UTF-8 characters, instead of UTF-16 characters for phoneme string "θɪŋ". Some instructions are not conform to the SSML recommendation. We suggest the following modifications at assertions: * #4 unknown format attribute The assertion specifies : "When the value for the format attribute is unknown or unsupported by a processor, it must render the contained text as if no format value were specified" But recommendation adds : ", and should render it using the interpret-as value that is specified." Thus test and reference should not sound identical. Suggestion: use a Multiple_Pair_Comp test. SSML test remains the same. SSML reference equals Test without format attribute. * #297 vendor-defined alphabet attribute As is, this test is identical to test #20. Instruction suggestion: for this test to pass, alphabet attribute should be a valid vendor-defined alphabet of the form "x-organization" or "x-organization-alphabet" * #139 <voice xml:lang="language-not-available">The cat jumped over the moon.</voice> Instruction is not in line with SSML recommendation which says about errors : "Results are undefined. A conforming synthesis processor may detect and report an error and may recover from it." * #225 The cat jumped over the moon.<prosody contour="(0%,+20Hz)">The cat jumped over the moon.</prosody> The first sentence is pronounced in a normal way, the second with a constant pitch, from begin (0%) to end (end value is copied from the nearest pitch target which 0%). Pitch target (+20Hz) is a relative target ("relative to the pitch value just before the contained text"). In our case, the last pitch value is pitch on end of last vowel of word "moon", that is: 64Hz. Then second sentence is pronounced with a constant pitch of 84Hz which is heard by a human being at a lower pitch than first sentence where pitch is varying between 64Hz and 132Hz. Instruction suggestion: "The second repetition of the sentence should have a constant pitch 20Hz above last voiced pitch frame of first sentence" * 269, 283, 284, 285, 287 and 288 Units are case sensitive Same remark as for TA#139. If it is an error, results are undefined. Furthermore we propose to improve two assertions. * #223 time positions less than 0% are ignored The folowing contour: "(-10%,-20Hz) (0%,+20.Hz) (10%,+30%) (40%,+10Hz)" is equivalent to: "(0%,+20.Hz) (10%,+30%) (40%,+10Hz)" even for a non-conformant SSML processor which interpolates between -10% and 0% At least, 0% target is to be removed from this contour in order to observe if SSML processors try to interpolate between -10% and +10%; which they shouldn't. Instead, they should copy nearest pitch target; that is 10%. To make the test more obvious for testers, the contour could be like this: "(-10%,50Hz)(100%,200Hz)" If an SSML processor interpolates between 50Hz and 200Hz (which it shouldn't) instead of using a constant pitch of 200Hz, it will clearly be audible. * #224 time positions greater then 100% are ignored The tested contour is: "(0%,+20.Hz) (10%,+30%) (40%,10Hz) (120%,-50.0Hz)" In case a non-conformant SSML processor interpolates between 40% and 120%, it is quite possible that it is not audible by testers. Actually, if pitch value just before the contained text is 60Hz, then -50Hz target equals 10Hz (60-50=10) and test sounds identical to reference. The suggestion is to use the following contour: "(0%,200Hz)(110%,50Hz)" If an SSML processor interpolates between 200Hz and 50Hz (which it shouldn't) instead of using a constant pitch of 200Hz, it will clearly be audible. Yours Faithfully, Edouard Hinard
Attachments
- text/xml attachment: francetelecom-ssml10-ir-results.xml
Received on Wednesday, 18 February 2004 09:13:49 UTC