W3C home > Mailing lists > Public > public-i18n-its@w3.org > July to September 2005

[ESW Wiki] Update of "its0908LinguisticMarkup" by GoutamSaha

From: <w3t-archive+esw-wiki@w3.org>
Date: Thu, 15 Sep 2005 00:06:09 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20050915000609.23692.11219@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by GoutamSaha:
http://esw.w3.org/topic/its0908LinguisticMarkup


------------------------------------------------------------------------------
       sentences.
     - Develop 3rd xml schema that contains various Parts-of-Speech 
       categories on words.
+ 
+  ======
-     
+    
-  
+  The Importance of the 3-tier generalized schema:
+ 
+   1.  In many languages, there are many common words that have
+       many meanings (word sense ambiguity) at various contexts
+       (though POS category may remain same). Only based on the
+       context of content domain and sentence, we understand  
+       the appropriate meaning of such words having word sense
+       ambiguities. 
+ 
+ 	For example, in English: the word "bat" has multiple meaning-
+ 
+       (a) "a bird" 	(in the content-domain of zoology/animal) or
+ 
+       (b) in the content domain of  sports it means "a playing 
+           instrument" to hit a ball (like cricket bat ).  In both 
+           cases, part-of-speech of  "bat" is noun (finer category-
+           common noun) only.
+ 
+ 	 Similarly, for an Indian language (say, for Bangla):  the
+         word "Dhar" has multiple meaning-
+ 
+ 	 (a) "to catch" in the content domain of say sports/law, 
+ 	 (b) "to assume" in the content domain
+ 	      of education,
+ 	 (c) "to have pain" in the content domain of  medical/health, 
+ 	 (d) "to come to an end" in the content domain of  weather and 
+ 	 (e) "to begin" in the content domain of  culture, [a-e with 
+             POS: verb]  and 
+ 	 (f) a family name [proper noun]  and so on.
+ 
+    2.    The 1st level schema (content domain markup) is useful for
+          marking the context information for a paragraph of 
+          translatable content. 
+ 
+    3.    The 2nd level schema (sentence level markups)  takes care
+          of  translatable proverbs, idioms, dialect and
+          usages etc for any human language in the world.
+ 
+    4.    The 3rd level schema (word level markups) is to obtain the 
+          most appropriate meaning of  "a word" (having POS ambiguity
+          with multiple POS  and word sense ambiguity)  in a 
+          senetence  inside a text content.
+   
+    5.    Content author will not find any difficulty on using such
+          markups because this scheme does not limit one to add an 
+          appropriate markup as an attribute. 
+ 
+    6.    Content author may not use such three level markups at all
+          parts (not for all words and sentences) of a document. 
+ 
+    7.    Markups need to be used only at the sensitive or difficult 
+          parts or ambiguous parts of a document.   
+ 
+    8.    For some languages, a content author even may not need to
+          add finer sub-category markups at his/her document. 
+ 
+ ======
+ 
  == Quick Guidelines ==
  
   - Examples on content domain categories: travel, science,
Received on Thursday, 15 September 2005 06:30:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:45 GMT