- From: <w3t-archive+esw-wiki@w3.org>
- Date: Thu, 15 Sep 2005 00:06:09 -0000
- To: w3t-archive+esw-wiki@w3.org
Dear Wiki user, You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification. The following page has been changed by GoutamSaha: http://esw.w3.org/topic/its0908LinguisticMarkup ------------------------------------------------------------------------------ sentences. - Develop 3rd xml schema that contains various Parts-of-Speech categories on words. + + ====== - + - + The Importance of the 3-tier generalized schema: + + 1. In many languages, there are many common words that have + many meanings (word sense ambiguity) at various contexts + (though POS category may remain same). Only based on the + context of content domain and sentence, we understand + the appropriate meaning of such words having word sense + ambiguities. + + For example, in English: the word "bat" has multiple meaning- + + (a) "a bird" (in the content-domain of zoology/animal) or + + (b) in the content domain of sports it means "a playing + instrument" to hit a ball (like cricket bat ). In both + cases, part-of-speech of "bat" is noun (finer category- + common noun) only. + + Similarly, for an Indian language (say, for Bangla): the + word "Dhar" has multiple meaning- + + (a) "to catch" in the content domain of say sports/law, + (b) "to assume" in the content domain + of education, + (c) "to have pain" in the content domain of medical/health, + (d) "to come to an end" in the content domain of weather and + (e) "to begin" in the content domain of culture, [a-e with + POS: verb] and + (f) a family name [proper noun] and so on. + + 2. The 1st level schema (content domain markup) is useful for + marking the context information for a paragraph of + translatable content. + + 3. The 2nd level schema (sentence level markups) takes care + of translatable proverbs, idioms, dialect and + usages etc for any human language in the world. + + 4. The 3rd level schema (word level markups) is to obtain the + most appropriate meaning of "a word" (having POS ambiguity + with multiple POS and word sense ambiguity) in a + senetence inside a text content. + + 5. Content author will not find any difficulty on using such + markups because this scheme does not limit one to add an + appropriate markup as an attribute. + + 6. Content author may not use such three level markups at all + parts (not for all words and sentences) of a document. + + 7. Markups need to be used only at the sensitive or difficult + parts or ambiguous parts of a document. + + 8. For some languages, a content author even may not need to + add finer sub-category markups at his/her document. + + ====== + == Quick Guidelines == - Examples on content domain categories: travel, science,
Received on Thursday, 15 September 2005 06:30:44 UTC