- From: Nick Levinson <nick_levinson@yahoo.com>
- Date: Sun, 20 Sep 2009 15:06:46 -0700 (PDT)
- To: www-voice@w3.org
We need a way to classify a group of words that are too numerous to itemize and assign a set of pronunciation rules to that group. This is for PLS 1.0, <http://www.w3.org/TR/pronunciation-lexicon/>. Numbers are an example. One range of numbers might serve as years or as single-moment quantities and therefore be pronounced differently. The same range might be pronounced formally or semiformally and still need to be correctly distinguished. E.g.: /eighteen twenty-two/: year semiformal /eighteen hundred and twenty-two/: year formal /eighteen hundred twenty-two/: quantity semiformal /one thousand eight hundred twenty-two/: quantity formal /eighteen hundred and twenty-two/: incorrect for quantity /one thousand eight hundred and twenty-two/: incorrect for quantity /one comma eight two two/: proofreader's style /one eight two two/: radio style likely in air traffic control and military I'm not sure if the lexeme element's role attribute would solve this problem. If role is too specific, I propose a new attribute, perhaps number-role or super-role, the latter supporting more values, such as "number-year" and "number-quantity". A range of numbers should be definable by specifying the smallest, the increment (e.g., integer), and the largest. That could determine whether a number might be a year; if the choice is between year and quantity and the number doesn't fit the specification for being a year then it must be a quantity. I assume role can already handle formality. If not, I propose an attribute: formality="" with values "formal", "semiformal", and "informal", and probably others. Quantities are not supposed to be pronounced with "and" except at the location of a decimal point, e.g., six and three fourths. It's technically incorrect to read aloud "33" as /thirty-three/ if the "33" is not in base 10. Even if the base doesn't have to be read aloud because the context makes it clear, a nondecimal-base "33" would be read as /three three/. Other number pronunciation ambiguities exist. E.g.: /one point two/ /one and two tenths/ Or consider 102.204/16: /one oh two dot two oh four slash sixteen/: IPv4 IP address range /one hundred two and two hundred four thousandths divided by sixteen/: arithmetic Or U.S. Zip codes used by the post office: E.g., for New York, N.Y.: 10001 /one triple oh one/ /one thousand one/ (I don't think I've ever heard anyone say /ten thousand one/ for that zip code, and I live in the city.) 10036 /one double oh thirty-six/ /one double oh three six/ /one oh oh three six/ /one oh oh thirty-six/ Or Brooklyn: 11201 /one twelve oh one/ Or 6-2: /six two/: e.g., Class 6-2 in an elementary or middle school /six dash two/ /six to two/: sports score /six hyphen two/: proofreader's style /from six to two/: e.g., an 8-hour work shift /six minus two/ /six less two/ /six take away two/: for children as the audience Or more complicated math, including one character having two senses: ([5+3][{4-1}x2]/x) /the quotient of the product of the sum of 5 and 3 and the product of the difference of 4 minus 1 times 2 divided by x/ Or this, for, e.g., 1011b: /one oh one one binary/ /one thousand eleven bits/ /one thousand eleven bytes/ Thank you. -- Nick
Received on Sunday, 20 September 2009 22:55:46 UTC