- From: Mark Alexandre <markalexandre@gmail.com>
- Date: Wed, 8 Feb 2006 06:05:08 -0600
- To: www-voice@w3.org
- Message-ID: <903631f00602080405k568ba97elc399754a41ca2c33@mail.gmail.com>
In the Pronunciation Lexicon Specification (PLS) v1.0, as of Draft 31, the usage of the element tag <grapheme> reflects a misunderstanding of the meaning of the word "grapheme." The definition in the spec's Glossary of Terms is nevertheless quite accurate: "One of the set of the smallest units of a written language, such as letters, ideograms, or symbols, that distinguish one word from another; a representation of a single orthographic element." Thus, the letter "g" and the numeral "4" are both examples of widely used graphemes, as are the question mark "?" and the dollar sign "$". In the current draft of the PLS however, the so-called "grapheme" element is mistakenly applied to what in English is commonly called the "spelling." I believe this will lead to confusion unless the element is renamed. As to what it should be named, more on that below. Before that, however, I wish to draw attention to an attribute of the grapheme element, the "orthography" attribute. This value of this attribute is, according to the draft spec, supposed to be a "script code" compliant with the ISO 15924 standard. The title of that standard is "Information and documentation — Codes for the representation of names of scripts." All of this naturally leads to the question: why not name this attribute "script" or "scriptcode"? The word "orthography", derived from Greek roots meaning (roughly) "correct" and "writing", can present some ambiguity between two related meanings, but neither meaning is the same as "script." Orthography can be used as a synonym for what most English speakers more commonly call the "spelling" of a particular word. Thus the examples "colour" and "color" are two different orthographies for the same word in the English language — the latter being the American orthography that was adopted following a set of spelling reforms which the rest of the English-speaking world declined to follow. The corresponding word in the French language is written with the orthography "couleur." In a related sense, the word "orthography" can be used to refer to an entire system of conventions for writing, including such issues as spelling and punctuation, plus even such trivia as the direction of writing (such as left-to-right or vice versa), the spacing and/or divisions of words, etc. Some may also comprehend the word in this broader sense to include issues of penmanship or calligraphy, that is, the correct method to compose or draw the graphemes (or characters, or symbols, or glyphs, if you like) of the language. Note that, in this sense, conventions of orthography can differ even between cultures that use the same alphabet — even between the style guides used by differing editorial staffs in the same metropolis! Neither meaning of "orthography" is to be conflated with the meaning of the word "script." The Greek, Latin and Cyrillic alphabets, as well as ancient cuneiform, Egyptian hieroglyphics, Chinese characters (Hanzi, or Kanji in Japan), etc., are all most precisely referred to as scripts, collectively. Perhaps the only alternative to "script" is the much more vague and expansive term, "writing system." Finally, then, it would seem clear that the weight of the evidence clearly argues for the attribute in question to be called "script" (or "scriptcode" to be verbose), as indeed it is called in ISO 15924. Having thus liberated the word "orthography" from misapplication, we may consider that word a candidate for the element incorrectly labelled "grapheme." In addition to "orthography," other candidates for the element now called grapheme in the draft spec might include "spelling" or "writing" or the almost comically long-winded "graphic presentation form." Any one of these four terms would be vastly preferable to "grapheme," —which, again, is simply wrong—but each does have certain short- comings as well. I will briefly list the problems I am aware of. The term "orthography" is almost ideal, except for its unavoidable connotation of "correctness." That is, were it ever desirable, for whatever reason, to list what may be deemed a "non-standard" written form of a word, then calling that an orthography for the word is misleading. On the other hand, obviously, if the PLS is specifically only intended to associate pronunciation with "correct" spellings (according to somebody's criteria of correctness), then orthography (in its narrower sense, vide supra) would be precisely accurate. An additional bonus is that this word is understood with pretty much the same meaning in other languages such as French and Spanish. The term "spelling" is by far the more commonly used word by English speakers when referring to how to write out a particular word. Further- more, it carries no connotation of correctness, since you can easily refer to "alternate spelling" or even "bad spelling." The downside, a minor one, is that the notion of spelling is strongly associated with alphabets; it is not at all clear what spelling means in the context of Chinese writing or similar non-alphabetic systems. The term "writing" is just vague enough to mean anything you want. Since it applies to every aspect of, well, writing, it could be applied to any aspect of it. Put another way, its upside is its downside. Finally, as for "graphic presentation form," or something similarly long and comically precise: one is tempted to wonder whether every XML parser out there really can handle sentence-length element names, as well as how many folks have access to an XML editor with a contextual auto-completion feature! Just to throw out one last (off-the-wall) possibility, consider that the Spanish cognate of the English word orthography is "ortografía", which commonly gets shortened to just "grafía." [See for example, http://www.xtec.es/~faguile1/grafia/ ]. This suggests that this originally Greek root for "writing" suffices all by itself to communicate the idea we are talking about here. Perhaps an Anglicized (or Anglicised, if you prefer) coining such as "graphy" or a more internationally flavored "graphia" or "graphie" would actually be the least open to misuse and misinterpretation, since—to paraphrase Humpty-Dumpty— it would mean just what we chose it to mean.
Received on Wednesday, 8 February 2006 13:59:04 UTC