W3C home > Mailing lists > Public > www-voice@w3.org > July to September 2009

pronunciation rules for types of numbers

From: Nick Levinson <nick_levinson@yahoo.com>
Date: Sun, 20 Sep 2009 15:06:46 -0700 (PDT)
Message-ID: <938211.30638.qm@web33505.mail.mud.yahoo.com>
To: www-voice@w3.org
We need a way to classify a group of words that are too numerous to itemize and assign a set of pronunciation rules to that group. This is for PLS 1.0, <http://www.w3.org/TR/pronunciation-lexicon/>.

Numbers are an example. One range of numbers might serve as years or as single-moment quantities and therefore be pronounced differently. The same range might be pronounced formally or semiformally and still need to be correctly distinguished. E.g.:
/eighteen twenty-two/: year semiformal
/eighteen hundred and twenty-two/: year formal
/eighteen hundred twenty-two/: quantity semiformal
/one thousand eight hundred twenty-two/: quantity formal
/eighteen hundred and twenty-two/: incorrect for quantity
/one thousand eight hundred and twenty-two/: incorrect for quantity
/one comma eight two two/: proofreader's style
/one eight two two/: radio style likely in air traffic control and military

I'm not sure if the lexeme element's role attribute would solve this problem. If role is too specific, I propose a new attribute, perhaps number-role or super-role, the latter supporting more values, such as "number-year" and "number-quantity".

A range of numbers should be definable by specifying the smallest, the increment (e.g., integer), and the largest. That could determine whether a number might be a year; if the choice is between year and quantity and the number doesn't fit the specification for being a year then it must be a quantity.

I assume role can already handle formality. If not, I propose an attribute: formality="" with values "formal", "semiformal", and "informal", and probably others.

Quantities are not supposed to be pronounced with "and" except at the location of a decimal point, e.g., six and three fourths.

It's technically incorrect to read aloud "33" as /thirty-three/ if the "33" is not in base 10. Even if the base doesn't have to be read aloud because the context makes it clear, a nondecimal-base "33" would be read as /three three/.

Other number pronunciation ambiguities exist. E.g.:
/one point two/
/one and two tenths/

Or consider 102.204/16:
/one oh two dot two oh four slash sixteen/: IPv4 IP address range
/one hundred two and two hundred four thousandths divided by sixteen/: arithmetic

Or U.S. Zip codes used by the post office: E.g., for New York, N.Y.:

10001
/one triple oh one/
/one thousand one/
(I don't think I've ever heard anyone say /ten thousand one/ for that zip code, and I live in the city.)

10036
/one double oh thirty-six/
/one double oh three six/
/one oh oh three six/
/one oh oh thirty-six/

Or Brooklyn:
11201
/one twelve oh one/

Or 6-2:
/six two/: e.g., Class 6-2 in an elementary or middle school
/six dash two/
/six to two/: sports score
/six hyphen two/: proofreader's style
/from six to two/: e.g., an 8-hour work shift
/six minus two/
/six less two/
/six take away two/: for children as the audience

Or more complicated math, including one character having two senses:
([5+3][{4-1}x2]/x)
/the quotient of the product of the sum of 5 and 3 and the product of the difference of 4 minus 1 times 2 divided by x/

Or this, for, e.g., 1011b:
/one oh one one binary/
/one thousand eleven bits/
/one thousand eleven bytes/

Thank you.

-- 
Nick


      
Received on Sunday, 20 September 2009 22:55:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 20 September 2009 22:55:48 GMT