[ESW Wiki] Update of "its0908LinguisticMarkup" by GoutamSaha from w3t-archive+esw-wiki@w3.org on 2005-09-29 (public-i18n-its@w3.org from July to September 2005)

From: <w3t-archive+esw-wiki@w3.org>
Date: Thu, 29 Sep 2005 22:51:39 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20050929225139.18387.75652@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by GoutamSaha:
http://esw.w3.org/topic/its0908LinguisticMarkup


------------------------------------------------------------------------------
           computational linguistic related metadata information in the
           structure of an XML document towards better translation.]] 
  
-    [[ GS- XML Schema authors should prefer to use attributes for 
+    [[ GS- '''XML Schema authors should prefer to use attributes for 
-           metadata information because of their better flexibility and portability.
+           metadata information because of their better flexibility and portability.'''
            XML content authors having school level language grammar knowledge will 
            not find any difficulty in marking up such language specific information.
            It is not mendatory for an author to add finer classified metadata at all.
            He/she has to add metadata at some parts of his/her content which are
            exceptionally special with respect to his/her source language aspects.
  
-           For example: for the Phrases_Idioms "cats and dogs" in english we can markup as
+           For example: for the '''Phrases_Idioms "cats and dogs" in english''' we can markup as
+ 
            <sentence_cat name="phrases_idioms" meaning="heavily"> cats and dogs
            </sentence_cat> 
+ 
            Such metadata will be of an immense help to localazation process (in order
            to find an appropriate phrases & idioms in a target language) without
            even knowing the source language- english well. 
  
-           Similarly, in Bangla- source language, for the phrases & idioms say,
+           Similarly, in '''Bangla- source language, for the phrases & idioms''' say,
            " Dumurer (english meaning is Fig's)  Fool (english meaning is Flower)"
            we can markup as
+ 
            <sentence_cat name="phrases_idioms" meaning="rarely visible">
+            Dumurer Fool
-           Dumurer Fool </sentence_cat> 
+           </sentence_cat> 
            
-           Such metadata is very useful as a semantic markup to a localization process,
+           '''Such metadata is very useful as a semantic markup to a localization process,
-           irrespective of a target language. ]] 
+           irrespective of a target language.''' ]] 
+ 
+ '''For the following Bengali or Bangla dialect sentence'''  
+ "Kaam (Kaaj in Bangla or Work in english)  Saira Falo (Shesh Koro in
+ Bangla or Complete in english)," 
+ we should markup the text with the '''three-layer metadata information''' in the following way:
+ 
+ <text xml:lang="ben">
+ <content_domain name="dialect">
+ <!-- content domain metadata -->
+ .... other sentences
+ ....
+  <sentence_cat name="imperative"> 
+  <!-- sentence level metadata is optional here -->
+      <pos_cat name="noun" meaning="work"> Kam </pos_cat>
+      <pos_cat name="verb" meaning="complete"> Saira Falo </pos_cat>
+  <!-- word level parts-of-speech -->
+  </sentence_cat> 
+ ......
+ </content_domain>
+ </text>
+ 
     
- Metadata information about the domain, sentence type or specific words
+ '''Metadata information about the domain, sentence type or specific words
- will help translators to do better quality work or to do the work quickly.
+ will help translators to do better quality work or to do the work quickly.'''
  If translators know that a word belongs to a specific domain then they can go to a 
  terminoloy data base and check the word,  thus, even for human translators this 
  3-Tier or 3-layer schema will be helpful. One cannot do an accurate translation
@@ -249, +274 @@

    
     <travel> 
    <communications> 
-    <diallect> 
+    <dialect> 
   <society>
      <humanities> 
      <civic>
Received on Friday, 30 September 2005 07:14:46 UTC