- From: <w3t-archive+esw-wiki@w3.org>
- Date: Thu, 15 Sep 2005 00:06:09 -0000
- To: w3t-archive+esw-wiki@w3.org
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.
The following page has been changed by GoutamSaha:
http://esw.w3.org/topic/its0908LinguisticMarkup
------------------------------------------------------------------------------
sentences.
- Develop 3rd xml schema that contains various Parts-of-Speech
categories on words.
+
+ ======
-
+
-
+ The Importance of the 3-tier generalized schema:
+
+ 1. In many languages, there are many common words that have
+ many meanings (word sense ambiguity) at various contexts
+ (though POS category may remain same). Only based on the
+ context of content domain and sentence, we understand
+ the appropriate meaning of such words having word sense
+ ambiguities.
+
+ For example, in English: the word "bat" has multiple meaning-
+
+ (a) "a bird" (in the content-domain of zoology/animal) or
+
+ (b) in the content domain of sports it means "a playing
+ instrument" to hit a ball (like cricket bat ). In both
+ cases, part-of-speech of "bat" is noun (finer category-
+ common noun) only.
+
+ Similarly, for an Indian language (say, for Bangla): the
+ word "Dhar" has multiple meaning-
+
+ (a) "to catch" in the content domain of say sports/law,
+ (b) "to assume" in the content domain
+ of education,
+ (c) "to have pain" in the content domain of medical/health,
+ (d) "to come to an end" in the content domain of weather and
+ (e) "to begin" in the content domain of culture, [a-e with
+ POS: verb] and
+ (f) a family name [proper noun] and so on.
+
+ 2. The 1st level schema (content domain markup) is useful for
+ marking the context information for a paragraph of
+ translatable content.
+
+ 3. The 2nd level schema (sentence level markups) takes care
+ of translatable proverbs, idioms, dialect and
+ usages etc for any human language in the world.
+
+ 4. The 3rd level schema (word level markups) is to obtain the
+ most appropriate meaning of "a word" (having POS ambiguity
+ with multiple POS and word sense ambiguity) in a
+ senetence inside a text content.
+
+ 5. Content author will not find any difficulty on using such
+ markups because this scheme does not limit one to add an
+ appropriate markup as an attribute.
+
+ 6. Content author may not use such three level markups at all
+ parts (not for all words and sentences) of a document.
+
+ 7. Markups need to be used only at the sensitive or difficult
+ parts or ambiguous parts of a document.
+
+ 8. For some languages, a content author even may not need to
+ add finer sub-category markups at his/her document.
+
+ ======
+
== Quick Guidelines ==
- Examples on content domain categories: travel, science,
Received on Thursday, 15 September 2005 06:30:44 UTC