W3C home > Mailing lists > Public > public-i18n-its@w3.org > October to December 2005

[ESW Wiki] Update of "its0908LinguisticMarkup" by GoutamSaha

From: <w3t-archive+esw-wiki@w3.org>
Date: Sun, 16 Oct 2005 03:00:03 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20051016030003.457.31133@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by GoutamSaha:
http://esw.w3.org/topic/its0908LinguisticMarkup


------------------------------------------------------------------------------
  == Summary ==
  
  {{{
- The proposed scheme is to demonstrate how to embed syntactic, semantic and computational linguistic related metadata information in the structure of an XML document towards better meaningful Translation through useful Markups meant for both the Internationalization (I18N) & 
+ The proposed scheme is to demonstrate how to embed syntactic, semantic and computational linguistic related metadata information in the structure of an XML document towards faster and better meaningful Translation through useful Markups meant for both the Internationalization (I18N) & 
  Localization (L10N) Processes.
  }}}
  
@@ -36, +36 @@

  and would like to assist Goutam in developing this topic when we are ready to deliberate it in detail.
  Well done!.]]'''
  
- [[GS- Excellent remarks in deed. It will be nice if AZ, MJD, FS, RI,YS or anyone share and add their knowledge on this recently proposed scheme.]]
  
  '''[FS'''- Maybe it was not clear in the minutes of the ITS f2f at ERCIM in September, but that was what we decided to do. Goutam, I hope that I understood you correct that you would agree on what Martin formulated - that we solve the current simple (they are hard enough) problems of ITS first and come back to linguistic markup later.''']]'''
+ 
+ '''[[GS- Excellent remarks/comments in deed. It will be nice if AZ, MJD, FS, RI,YS or anyone add and share their knowledge on this recently proposed scheme.]]'''
  
               
     [[GS- The 3-Tier XML Schema approach is useful for an XML content
@@ -93, +94 @@

  
            '''Again, for a link inside an XML PCDATA/ text content, we might differentiate
               links from the text by the following markup to treat them separately, 
-              for an example,        __Click Here for Sign Up__ '''
+              for an example,        __Click Here for Sign Up__ 
               
  {{{
  <!-- Markup for a Link -->
@@ -159, +160 @@

  For example: 
  
  {{{
- <!-- Markup for Word/Phrase Sense Disambiguation in Context Dependent Usage -->
+ <!-- Markup for Word/Phrase Sense Disambiguation or for Context Dependent Usage -->
  
  <content_domain name="factory">
  
@@ -175, +176 @@

  
  </content_domain>
  
- =====
+ 
  
  <content_domain name="office">
  
@@ -308, +309 @@

  ''Demonstrative Adjective:'' ''This'' boy is strong. ''That'' boy is industrious. Don't 
  be in ''such'' a hurry. I hate ''such'' things. ''These'' mangoes are sweet.
  
- A typical example is shown below in order to show how grammatical knowledge can be used as XML markups for the sentence "This boy is strong." However, word-level markups for all words in a sentence may not be required. We need to markup only at language specific parts. 
+ A typical example is shown below in order to show how grammatical knowledge can be used as XML markups for the sentence "This boy is strong." However, word-level markups for all words in a sentence may not be required. We need to '''markup only at language specific parts.''' 
  
+ {{{
+ <!-- Example for Word- Level Markup -->
+  
- {{{<pos_cat name="adjective" type="demonstrative"> This </pos_cat> 
+ <pos_cat name="adjective" type="demonstrative"> This </pos_cat> 
+ 
  <pos_cat name="noun" type="common"> boy </pos_cat> 
+ 
  <pos_cat name="verb" type="linking"> is </pos_cat> 
+ 
- <pos_cat name="adjective" type="general"> strong </pos_cat> .}}} 
+ <pos_cat name="adjective" type="general"> strong 
+ 
+ </pos_cat> .
+ 
+ }}} 
  
  The same sentence "This boy is strong." can also be marked up in the following way without using finer parts-of-speech categories (depending on the requirements of a translation parser for a specifc language-pair).
  
+ {{{
+ <!-- Example for Adding Word-Level Markup -->
+ 
- {{{<pos_cat name="adjective"> This </pos_cat> 
+ <pos_cat name="adjective"> This </pos_cat> 
+ 
  <pos_cat name="noun"> boy </pos_cat> 
+ 
  <pos_cat name="verb"> is </pos_cat> 
+ 
- <pos_cat name="adjective"> strong </pos_cat> .}}} 
+ <pos_cat name="adjective"> strong </pos_cat> .
+ 
+ }}} 
  
   The sentence "Light a light light." can be marked up with word-level parts-of-speech metadata information in the following way without using finer parts-of-speech categories (depending on the requirements of a translation parser for a specifc language-pair).
  
@@ -332, +351 @@

  <pos_cat name="adjective"> light </pos_cat> 
  
  <pos_cat name="noun"> light </pos_cat> .
+ 
  }}} 
  
  '''Interjections''' are words or phrases used to ''exclaim'' or protest or command. 
@@ -400, +420 @@

  We can also use this '''3-Tier Schema approach''' for indicating whether a content or a sentence or a word needs to be kept "as it is" (without translation) or not. The following markups can be used:
  
  {{{
+ <!-- Markup to exclude a sentence from translation -->
  
  <content_domain name="religion" type="no_translation">
  
@@ -416, +437 @@

  An example is given below how to add metadata information to skip translation process for a sentence in a '''Cultural''' content's chanting part. 
  
  {{{
- 
+ <!-- Markup for Cultural Chant that need not be translated -->
+  
  <content_domain name="cultural">
  
  <!-- default is to translate -->
@@ -428, +450 @@

  
  <sentence_cat name="chant" type="no_translation">
  
- Om Ganeshaioh Namoh.
+ Om Ganeshaioh Namoh. 
  
  </sentence_cat>
  
@@ -436, +458 @@

  
  }}}
  
- In many '''sentences''' we often use '''multilingual words'''. For example, in Hindi, "Kaam joldi start Kijiye" (i.e., in English- Start the work immediately. Lexicons: Kaam/ Work, Joldi/ immediately, Kijiye/ Do). Please note that there is an English word "start" in that Hindi sentence. Such usage of multilingual wordings are very common in any urban area. As we are adding the meaning of a foreign language word (e.g., start) in a sentence of some other source language, say, Hindi, so there won't be any problem for understanding a multilingual sentence for a translation parser.
+ In many '''sentences''' we often use '''multilingual words'''. For example, in the Hindi sentence,  "Kaam joldi start Kijiye" (i.e., in English:- "Start the work immediately." Lexicons:- Kaam/ Work, Joldi/ immediately, Kijiye/ Do). Please note that there is an English word "start" in the source language Hindi sentence. Such usage of multilingual wordings are very common in any urban area. As we are providing the meaning of a foreign language word (e.g., start) in a sentence of some other source language, say, Hindi, so there won't be any problem for a translation parserfor understanding a sentence that contains multilingual words.
  
  {{{
  <!-- Markup for a Sentence having Multilingual Words -->
Received on Sunday, 16 October 2005 13:00:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:06 UTC