W3C home > Mailing lists > Public > www-voice@w3.org > July to September 2013

PLS and Part of Speech Tagging

From: Baggia, Paolo <Paolo.Baggia@nuance.com>
Date: Thu, 4 Jul 2013 14:23:48 +0000
To: Percy Henry <phenry1026@hotmail.com>
CC: "Baggia, Paolo" <Paolo.Baggia@nuance.com>, "www-voice@w3.org" <www-voice@w3.org>
Message-ID: <3D94157CE8372049AD4F112AE6E5765A36140346@SOM-EXCH01.nuance.com>
Hi Henry,

First of all, I have to say that I wasn't involved in further work around PLS 1.0 after it became W3C Recommendation, so I can only share my personal opinion, and try to recall some of the discussions done during that time.

* By the way if you want to share something about your interest on using PLS 1.0, I'll appreciate.

The primary goal of PLS 1.0 was to offer a standard to VBWG engines (mainly but not limited to ASR and TTS engines) for encoding lexical pronunciations. For those engines the lexicon was used to improve pronunciation of specific ASR or TTS elements.

In that scenario the 'role' attribute was seen as an optional feature of the language to be used by advanced engines.

This is the reason why there are only a few examples and it is a bit under-specified. There wasn't an attempt to standardize syntactic categories, because outside the charter of that standardization effort, but possibly to use existing ones inside PLS.

On 'role' attribute:

1. An organization or individual should declare admissible values for the role attribute. The suggested way was to declare a Namespace to be unambiguous in an XML world. For instance CLAWS should have been declared as a Namespace and registered as an URI.

2. If the above is done, a PLS document might reference the specific namespace to bind a prefix to that namespace like "claws" (see section 4.4, [1]) or "mypos", in the example section 5.5 [2]. By doing that the prefix will be bound to the namespace and it will allow an engine to check the correcteness of the values.

3. At this point some of the lexeme elements might include the role attribute in their definition and the role attribute will contain a string of whitespace separated Qnames in the form of "prefix:value", e.g. "claws:NN", or "claws:DD1", where "claws" is the prefix to connect to a specific namespace and the rest are legal values in that namespace.

4. An engine can then use the "role" attribute during the selection of a pronunciation in a specific context.

This is what I recall of the discussions done at that time and I'm sure there are details to be completed, but I hope it will help you to better understand the picture.

About your specific question:

-          xmlns:mypos="http://www.example.org/my_pos_namespace"
it should be replaced with a real and stable URI (possibly an official one provided by CLAWS people, or other organizations for different POS values)"and "mypos" is your prefix and can be changes as you like, for instance "claws"

-          role="claws:VV0" or role="claws:NN"
for a lexical entry that is a Verb and the other for the Noun

For another set of POS, like Stanford, the xmlns should contain a different prefix with a specific URI.


Paolo Baggia

[1] http://www.w3.org/TR/pronunciation-lexicon/#S4.4

[2] http://www.w3.org/TR/pronunciation-lexicon/#S5.5

From: Percy Henry <phenry1026@hotmail.com<mailto:phenry1026@hotmail.com?Subject=Re%3A%20PLS%20and%20Part%20of%20Speech%20Tagging&In-Reply-To=%3CBLU168-W10031CDD34B72E4FD911AB3D2890%40phx.gbl%3E&References=%3CBLU168-W10031CDD34B72E4FD911AB3D2890%40phx.gbl%3E>>
Date: Sat, 22 Jun 2013 22:16:34 -0400
Message-ID: <BLU168-W10031CDD34B72E4FD911AB3D2890@phx.gbl>
To: "www-voice@w3.org<mailto:www-voice@w3.org?Subject=Re%3A%20PLS%20and%20Part%20of%20Speech%20Tagging&In-Reply-To=%3CBLU168-W10031CDD34B72E4FD911AB3D2890%40phx.gbl%3E&References=%3CBLU168-W10031CDD34B72E4FD911AB3D2890%40phx.gbl%3E>" <www-voice@w3.org<mailto:www-voice@w3.org?Subject=Re%3A%20PLS%20and%20Part%20of%20Speech%20Tagging&In-Reply-To=%3CBLU168-W10031CDD34B72E4FD911AB3D2890%40phx.gbl%3E&References=%3CBLU168-W10031CDD34B72E4FD911AB3D2890%40phx.gbl%3E>>


Could you inform me, how do I modify the dummy line:


to use and recognize the UCREL CLAWS7 Tagset in a Pronunciation Lexicon Specification (PLS) File.

Any real world working replacement for xmlns:mypos="http://www.example.org/my_pos_namespace" would be greatly appreciated.

Also could the Stanford POS Tagger used as the tagger for Pronunciation Lexicon Specification (PLS) File.
Received on Thursday, 4 July 2013 14:48:09 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:04:01 UTC