W3C home > Mailing lists > Public > public-i18n-its@w3.org > July to September 2005

[ESW Wiki] Update of "its0908LinguisticMarkup" by GoutamSaha

From: <w3t-archive+esw-wiki@w3.org>
Date: Thu, 15 Sep 2005 00:06:09 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20050915000609.23692.11219@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by GoutamSaha:

     - Develop 3rd xml schema that contains various Parts-of-Speech 
       categories on words.
+  ======
+  The Importance of the 3-tier generalized schema:
+   1.  In many languages, there are many common words that have
+       many meanings (word sense ambiguity) at various contexts
+       (though POS category may remain same). Only based on the
+       context of content domain and sentence, we understand  
+       the appropriate meaning of such words having word sense
+       ambiguities. 
+ 	For example, in English: the word "bat" has multiple meaning-
+       (a) "a bird" 	(in the content-domain of zoology/animal) or
+       (b) in the content domain of  sports it means "a playing 
+           instrument" to hit a ball (like cricket bat ).  In both 
+           cases, part-of-speech of  "bat" is noun (finer category-
+           common noun) only.
+ 	 Similarly, for an Indian language (say, for Bangla):  the
+         word "Dhar" has multiple meaning-
+ 	 (a) "to catch" in the content domain of say sports/law, 
+ 	 (b) "to assume" in the content domain
+ 	      of education,
+ 	 (c) "to have pain" in the content domain of  medical/health, 
+ 	 (d) "to come to an end" in the content domain of  weather and 
+ 	 (e) "to begin" in the content domain of  culture, [a-e with 
+             POS: verb]  and 
+ 	 (f) a family name [proper noun]  and so on.
+    2.    The 1st level schema (content domain markup) is useful for
+          marking the context information for a paragraph of 
+          translatable content. 
+    3.    The 2nd level schema (sentence level markups)  takes care
+          of  translatable proverbs, idioms, dialect and
+          usages etc for any human language in the world.
+    4.    The 3rd level schema (word level markups) is to obtain the 
+          most appropriate meaning of  "a word" (having POS ambiguity
+          with multiple POS  and word sense ambiguity)  in a 
+          senetence  inside a text content.
+    5.    Content author will not find any difficulty on using such
+          markups because this scheme does not limit one to add an 
+          appropriate markup as an attribute. 
+    6.    Content author may not use such three level markups at all
+          parts (not for all words and sentences) of a document. 
+    7.    Markups need to be used only at the sensitive or difficult 
+          parts or ambiguous parts of a document.   
+    8.    For some languages, a content author even may not need to
+          add finer sub-category markups at his/her document. 
+ ======
  == Quick Guidelines ==
   - Examples on content domain categories: travel, science,
Received on Thursday, 15 September 2005 06:30:44 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:05 UTC